GraphRAG — AI Glossary

GraphRAG (Graph Retrieval-Augmented Generation) is an evolution of standard RAG that replaces flat vector search with a structured knowledge graph. Where traditional RAG retrieves isolated text chunks based on embedding similarity, GraphRAG understands the relationships between entities across an entire corpus — enabling it to answer complex, holistic questions that require connecting information spread across many documents.

The technique was developed and open-sourced by Microsoft Research in 2024, with the foundational paper introducing the concept of “local-to-global” querying over large text datasets.

The Problem with Standard RAG

Standard RAG works well for local questions — ones where the answer lives in a specific passage (“What did the CEO say about Q3 revenue?”). It struggles with global questions that require synthesising patterns across an entire document set (“What are the main themes across all our customer feedback?”). This is because:

Embedding similarity retrieves nearby chunks, not conceptually connected ones
No structure exists to traverse relationships between people, organisations, events, or concepts
Each retrieved chunk is treated as independent context with no graph of how things relate

GraphRAG addresses all three limitations.

How GraphRAG Works

The pipeline has two distinct phases — indexing and querying.

Indexing Phase

Text chunking: The source corpus is split into TextUnits — manageable segments that serve as the unit of analysis.
Entity and relationship extraction: An LLM reads each TextUnit and extracts named entities (people, places, organisations, concepts) and the relationships between them, along with key claims.
Knowledge graph construction: Extracted entities and relationships are assembled into a graph where nodes are entities and edges represent relationships, each with associated metadata and source provenance.
Community detection: A graph clustering algorithm (typically Leiden) groups closely related entities into communities — clusters of nodes that are more connected to each other than to the rest of the graph.
Community summarisation: An LLM generates a natural-language summary of each community, capturing the key themes, relationships, and claims within it. These summaries are stored alongside the graph.

Querying Phase

GraphRAG supports two query modes:

Local search: For specific entity-level questions, the system traverses the graph from relevant entities, pulling in neighbouring nodes, relationships, and associated text chunks. Combines structured graph context with raw text for precise answers.
Global search: For broad, thematic questions, the system queries across all community summaries, generates partial answers from each, then synthesises them into a final comprehensive response. This is what enables sensemaking over millions of tokens.

Performance Gains

Microsoft’s research showed substantial improvements over standard RAG on complex queries:

80% accuracy on global sensemaking questions vs. 50% for traditional RAG
3.4× improvement on enterprise benchmarks requiring cross-document reasoning
72–83% higher comprehensiveness on questions requiring holistic understanding of a dataset

These gains are most pronounced on datasets in the 1 million token range where vector search alone cannot maintain coherence across the full corpus.

GraphRAG vs. Standard RAG

	Standard RAG	GraphRAG
Retrieval unit	Text chunk	Entity, relationship, community summary
Context structure	Flat list of passages	Graph of connected entities
Best for	Specific factual lookups	Thematic, relational, multi-hop questions
Indexing cost	Low (embeddings only)	Higher (LLM extraction + graph build)
Query latency	Fast	Slightly higher for global queries
Corpus size sweet spot	Small–medium	Large, interconnected document sets

When to Use GraphRAG

GraphRAG is the right choice when:

Your questions require connecting information across many documents (e.g. analyst reports, legal contracts, research papers, CRM notes)
You need to understand relationships between entities — who worked with whom, what influenced what
Your users ask thematic or exploratory questions rather than just factual lookups
Your corpus is large and dense with cross-references

Stick with standard RAG when queries are predominantly local and factual, or when indexing cost and latency are primary constraints.

Implementation

Microsoft’s open-source graphrag library provides a production-ready implementation. Key configuration choices include:

LLM for extraction: A capable model (GPT-4-class or equivalent) is needed for high-quality entity/relationship extraction during indexing
Community algorithm: Leiden clustering is the default; resolution parameters control granularity
Storage: The graph index can be stored in flat files, a vector store, or a graph database depending on scale
Query mode selection: Local vs. global search is chosen at query time based on the question type

GraphRAG integrates naturally into existing RAG pipelines — it can complement vector search rather than replace it entirely, using graph traversal for relational queries and embedding retrieval for precise factual lookups.

How to Use — GraphRAG with Neo4j + LLM entity extraction

python

from anthropic import Anthropic
from neo4j import GraphDatabase

client = Anthropic()
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

# Step 1 — extract entities and relationships from text via LLM
def extract_graph(text: str) -> dict:
    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=1024,
        system=(
            "Extract entities and relationships from the text. "
            "Return JSON: {entities: [{name, type}], relationships: [{source, relation, target}]}"
        ),
        messages=[{"role": "user", "content": text}],
    )
    import json
    return json.loads(response.content[0].text)

# Step 2 — write entities and relationships to Neo4j
def ingest(text: str):
    graph = extract_graph(text)
    with driver.session() as session:
        for e in graph.get("entities", []):
            session.run(
                "MERGE (n:Entity {name: $name}) SET n.type = $type",
                name=e["name"], type=e["type"],
            )
        for r in graph.get("relationships", []):
            session.run(
                """MATCH (a:Entity {name: $src}), (b:Entity {name: $tgt})
                   MERGE (a)-[:RELATES {type: $rel}]->(b)""",
                src=r["source"], tgt=r["target"], rel=r["relation"],
            )

# Step 3 — retrieve subgraph and answer a multi-hop question
def graph_rag(question: str) -> str:
    with driver.session() as session:
        result = session.run(
            "MATCH (a)-[r]->(b) RETURN a.name, r.type, b.name LIMIT 30"
        )
        triples = [f"{row['a.name']} --{row['r.type']}--> {row['b.name']}" for row in result]

    context = "\n".join(triples)
    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=512,
        system="Answer using only the graph triples provided. Be precise.",
        messages=[{"role": "user", "content": f"Graph:\n{context}\n\nQuestion: {question}"}],
    )
    return response.content[0].text

ingest("Anthropic created Claude. Claude supports MCP. MCP was designed by Anthropic engineers.")
print(graph_rag("Who created the protocol that Claude supports?"))

Ready to build?

Leverage AI technologies to build your product stack

Superteams can help you build, deploy and launch AI application stacks using open source technologies — from architecture through to production.

Talk to Superteams