Building GraphRAG Systems with LangGraph and Neo4j

Graph Retrieval-Augmented Generation (GraphRAG) combines the power of knowledge graphs with large language models to create intelligent systems that understand complex relationships in your data.

What is GraphRAG?

Traditional RAG (Retrieval-Augmented Generation) retrieves relevant documents based on semantic similarity. GraphRAG takes it further by:

Understanding relationships between entities
Traversing graph structures for context
Maintaining knowledge consistency
Enabling multi-hop reasoning

Architecture

Our GraphRAG system consists of three main components:

1. Knowledge Graph (Neo4j)

// Create entities and relationships
CREATE (p:Person {name: "Shahariar Hossen"})
CREATE (c:Company {name: "Rakuten"})
CREATE (t:Technology {name: "Spring Boot"})
CREATE (p)-[:WORKS_AT]->(c)
CREATE (p)-[:EXPERT_IN]->(t)

2. LangGraph for Orchestration

from langgraph.graph import StateGraph
from langchain_openai import ChatOpenAI

class GraphRAGState(TypedDict):
    query: str
    entities: List[str]
    graph_context: str
    response: str

def extract_entities(state: GraphRAGState) -> GraphRAGState:
    # Extract entities from query
    llm = ChatOpenAI(model="gpt-4")
    entities = llm.invoke(f"Extract entities: {state['query']}")
    return {"entities": entities}

def query_graph(state: GraphRAGState) -> GraphRAGState:
    # Query Neo4j for relationships
    with driver.session() as session:
        result = session.run("""
            MATCH (e:Entity)-[r]->(related)
            WHERE e.name IN $entities
            RETURN e, r, related
        """, entities=state['entities'])
    return {"graph_context": format_results(result)}

def generate_response(state: GraphRAGState) -> GraphRAGState:
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke(f"""
        Context: {state['graph_context']}
        Query: {state['query']}
        Answer:
    """)
    return {"response": response}

# Build the graph
workflow = StateGraph(GraphRAGState)
workflow.add_node("extract", extract_entities)
workflow.add_node("query", query_graph)
workflow.add_node("generate", generate_response)
workflow.add_edge("extract", "query")
workflow.add_edge("query", "generate")

3. Embedding & Vector Search

For hybrid search combining vector similarity and graph traversal:

from langchain_openai import OpenAIEmbeddings
from neo4j import GraphDatabase

embeddings = OpenAIEmbeddings()

def hybrid_search(query: str, top_k: int = 5):
    # Vector search
    query_embedding = embeddings.embed_query(query)
    
    # Neo4j vector index search
    with driver.session() as session:
        results = session.run("""
            CALL db.index.vector.queryNodes(
                'document_embeddings', 
                $k, 
                $embedding
            )
            YIELD node, score
            MATCH (node)-[r]-(related)
            RETURN node, r, related, score
            ORDER BY score DESC
        """, k=top_k, embedding=query_embedding)
    
    return results

Implementation Details

Data Ingestion Pipeline

Document Processing: Extract text and metadata
Entity Extraction: Use NER models to identify entities
Relationship Extraction: LLM-based relationship identification
Graph Construction: Build Neo4j knowledge graph

def ingest_document(document: str):
    # Extract entities
    entities = ner_model(document)
    
    # Extract relationships
    relationships = llm.invoke(f"""
        Extract relationships from: {document}
        Format: (entity1, relationship, entity2)
    """)
    
    # Store in Neo4j
    with driver.session() as session:
        for entity in entities:
            session.run("""
                MERGE (e:Entity {name: $name})
                SET e.type = $type
            """, name=entity.text, type=entity.label_)
        
        for rel in relationships:
            session.run("""
                MATCH (a:Entity {name: $entity1})
                MATCH (b:Entity {name: $entity2})
                MERGE (a)-[r:RELATES_TO {type: $rel_type}]->(b)
            """, entity1=rel[0], rel_type=rel[1], entity2=rel[2])

Query Processing

async def process_query(query: str) -> str:
    workflow = create_workflow()
    
    initial_state = {
        "query": query,
        "entities": [],
        "graph_context": "",
        "response": ""
    }
    
    result = await workflow.ainvoke(initial_state)
    return result["response"]

Use Cases

1. Enterprise Knowledge Base

Employee expertise mapping
Project relationships
Technology stack tracking

2. Research Assistant

Paper citations and relationships
Author collaborations
Topic clustering

3. Customer Support

Product relationships
Issue resolution paths
Knowledge article connections

Performance Optimization

1. Graph Indexing

CREATE INDEX entity_name FOR (e:Entity) ON (e.name)
CREATE VECTOR INDEX document_embeddings FOR (d:Document) ON (d.embedding)

2. Query Caching

Cache frequent graph traversals
Reuse entity embeddings
Implement TTL-based invalidation

3. Batch Processing

Process multiple queries in parallel
Batch Neo4j operations
Use connection pooling

Results

In our implementation for enterprise knowledge management:

Query Accuracy: 92% vs 78% with traditional RAG
Response Time: < 2 seconds for complex queries
Graph Size: 1M+ nodes, 5M+ relationships
Multi-hop Reasoning: Up to 3 hops efficiently

Challenges & Solutions

Challenge 1: Graph Complexity

Solution: Implement graph pruning and relevance scoring

Challenge 2: Scaling

Solution: Use Neo4j sharding and read replicas

Challenge 3: Consistency

Solution: Implement versioning and transaction management

Tech Stack

LangGraph: Workflow orchestration
Neo4j: Knowledge graph database
LangChain: LLM integration
OpenAI GPT-4: Language model
FastAPI: REST API
Docker: Containerization

Conclusion

GraphRAG represents the next evolution in knowledge management systems. By combining the structured knowledge of graphs with the flexibility of LLMs, we can build systems that truly understand and reason about complex information.

Check out the full implementation on my GitHub or connect with me on LinkedIn.

Building GraphRAG Systems with LangGraph and Neo4j

Building GraphRAG Systems with LangGraph and Neo4j

What is GraphRAG?

Architecture

1. Knowledge Graph (Neo4j)

2. LangGraph for Orchestration

3. Embedding & Vector Search

Implementation Details

Data Ingestion Pipeline

Query Processing

Use Cases

1. Enterprise Knowledge Base

2. Research Assistant

3. Customer Support

Performance Optimization

1. Graph Indexing

2. Query Caching

3. Batch Processing

Results

Challenges & Solutions

Challenge 1: Graph Complexity

Challenge 2: Scaling

Challenge 3: Consistency

Tech Stack

Conclusion

Shahariar Hossen

Continue Reading