GraphRAG (Graph Retrieval-Augmented Generation) is an evolution of standard RAG that replaces flat vector search with a structured knowledge graph. Where traditional RAG retrieves isolated text chunks based on embedding similarity, GraphRAG understands the relationships between entities across an entire corpus — enabling it to answer complex, holistic questions that require connecting information spread across many documents.
The technique was developed and open-sourced by Microsoft Research in 2024, with the foundational paper introducing the concept of “local-to-global” querying over large text datasets.
The Problem with Standard RAG
Standard RAG works well for local questions — ones where the answer lives in a specific passage (“What did the CEO say about Q3 revenue?”). It struggles with global questions that require synthesising patterns across an entire document set (“What are the main themes across all our customer feedback?”). This is because:
- Embedding similarity retrieves nearby chunks, not conceptually connected ones
- No structure exists to traverse relationships between people, organisations, events, or concepts
- Each retrieved chunk is treated as independent context with no graph of how things relate
GraphRAG addresses all three limitations.
How GraphRAG Works
The pipeline has two distinct phases — indexing and querying.
Indexing Phase
- Text chunking: The source corpus is split into TextUnits — manageable segments that serve as the unit of analysis.
- Entity and relationship extraction: An LLM reads each TextUnit and extracts named entities (people, places, organisations, concepts) and the relationships between them, along with key claims.
- Knowledge graph construction: Extracted entities and relationships are assembled into a graph where nodes are entities and edges represent relationships, each with associated metadata and source provenance.
- Community detection: A graph clustering algorithm (typically Leiden) groups closely related entities into communities — clusters of nodes that are more connected to each other than to the rest of the graph.
- Community summarisation: An LLM generates a natural-language summary of each community, capturing the key themes, relationships, and claims within it. These summaries are stored alongside the graph.
Querying Phase
GraphRAG supports two query modes:
- Local search: For specific entity-level questions, the system traverses the graph from relevant entities, pulling in neighbouring nodes, relationships, and associated text chunks. Combines structured graph context with raw text for precise answers.
- Global search: For broad, thematic questions, the system queries across all community summaries, generates partial answers from each, then synthesises them into a final comprehensive response. This is what enables sensemaking over millions of tokens.
Performance Gains
Microsoft’s research showed substantial improvements over standard RAG on complex queries:
- 80% accuracy on global sensemaking questions vs. 50% for traditional RAG
- 3.4× improvement on enterprise benchmarks requiring cross-document reasoning
- 72–83% higher comprehensiveness on questions requiring holistic understanding of a dataset
These gains are most pronounced on datasets in the 1 million token range where vector search alone cannot maintain coherence across the full corpus.
GraphRAG vs. Standard RAG
| Standard RAG | GraphRAG | |
|---|---|---|
| Retrieval unit | Text chunk | Entity, relationship, community summary |
| Context structure | Flat list of passages | Graph of connected entities |
| Best for | Specific factual lookups | Thematic, relational, multi-hop questions |
| Indexing cost | Low (embeddings only) | Higher (LLM extraction + graph build) |
| Query latency | Fast | Slightly higher for global queries |
| Corpus size sweet spot | Small–medium | Large, interconnected document sets |
When to Use GraphRAG
GraphRAG is the right choice when:
- Your questions require connecting information across many documents (e.g. analyst reports, legal contracts, research papers, CRM notes)
- You need to understand relationships between entities — who worked with whom, what influenced what
- Your users ask thematic or exploratory questions rather than just factual lookups
- Your corpus is large and dense with cross-references
Stick with standard RAG when queries are predominantly local and factual, or when indexing cost and latency are primary constraints.
Implementation
Microsoft’s open-source graphrag library provides a production-ready implementation. Key configuration choices include:
- LLM for extraction: A capable model (GPT-4-class or equivalent) is needed for high-quality entity/relationship extraction during indexing
- Community algorithm: Leiden clustering is the default; resolution parameters control granularity
- Storage: The graph index can be stored in flat files, a vector store, or a graph database depending on scale
- Query mode selection: Local vs. global search is chosen at query time based on the question type
GraphRAG integrates naturally into existing RAG pipelines — it can complement vector search rather than replace it entirely, using graph traversal for relational queries and embedding retrieval for precise factual lookups.