AI/MLGraphRAGLangGraphNeo4j

Building GraphRAG Systems with LangGraph and Neo4j

January 10, 2025
4 min read
Share:
Building GraphRAG Systems with LangGraph and Neo4j

Building GraphRAG Systems with LangGraph and Neo4j

Graph Retrieval-Augmented Generation (GraphRAG) combines the power of knowledge graphs with large language models to create intelligent systems that understand complex relationships in your data.

What is GraphRAG?

Traditional RAG (Retrieval-Augmented Generation) retrieves relevant documents based on semantic similarity. GraphRAG takes it further by:

  • Understanding relationships between entities
  • Traversing graph structures for context
  • Maintaining knowledge consistency
  • Enabling multi-hop reasoning

Architecture

Our GraphRAG system consists of three main components:

1. Knowledge Graph (Neo4j)

// Create entities and relationships
CREATE (p:Person {name: "Shahariar Hossen"})
CREATE (c:Company {name: "Rakuten"})
CREATE (t:Technology {name: "Spring Boot"})
CREATE (p)-[:WORKS_AT]->(c)
CREATE (p)-[:EXPERT_IN]->(t)

2. LangGraph for Orchestration

from langgraph.graph import StateGraph
from langchain_openai import ChatOpenAI

class GraphRAGState(TypedDict):
    query: str
    entities: List[str]
    graph_context: str
    response: str

def extract_entities(state: GraphRAGState) -> GraphRAGState:
    # Extract entities from query
    llm = ChatOpenAI(model="gpt-4")
    entities = llm.invoke(f"Extract entities: {state['query']}")
    return {"entities": entities}

def query_graph(state: GraphRAGState) -> GraphRAGState:
    # Query Neo4j for relationships
    with driver.session() as session:
        result = session.run("""
            MATCH (e:Entity)-[r]->(related)
            WHERE e.name IN $entities
            RETURN e, r, related
        """, entities=state['entities'])
    return {"graph_context": format_results(result)}

def generate_response(state: GraphRAGState) -> GraphRAGState:
    llm = ChatOpenAI(model="gpt-4")
    response = llm.invoke(f"""
        Context: {state['graph_context']}
        Query: {state['query']}
        Answer:
    """)
    return {"response": response}

# Build the graph
workflow = StateGraph(GraphRAGState)
workflow.add_node("extract", extract_entities)
workflow.add_node("query", query_graph)
workflow.add_node("generate", generate_response)
workflow.add_edge("extract", "query")
workflow.add_edge("query", "generate")

3. Embedding & Vector Search

For hybrid search combining vector similarity and graph traversal:

from langchain_openai import OpenAIEmbeddings
from neo4j import GraphDatabase

embeddings = OpenAIEmbeddings()

def hybrid_search(query: str, top_k: int = 5):
    # Vector search
    query_embedding = embeddings.embed_query(query)
    
    # Neo4j vector index search
    with driver.session() as session:
        results = session.run("""
            CALL db.index.vector.queryNodes(
                'document_embeddings', 
                $k, 
                $embedding
            )
            YIELD node, score
            MATCH (node)-[r]-(related)
            RETURN node, r, related, score
            ORDER BY score DESC
        """, k=top_k, embedding=query_embedding)
    
    return results

Implementation Details

Data Ingestion Pipeline

  1. Document Processing: Extract text and metadata
  2. Entity Extraction: Use NER models to identify entities
  3. Relationship Extraction: LLM-based relationship identification
  4. Graph Construction: Build Neo4j knowledge graph
def ingest_document(document: str):
    # Extract entities
    entities = ner_model(document)
    
    # Extract relationships
    relationships = llm.invoke(f"""
        Extract relationships from: {document}
        Format: (entity1, relationship, entity2)
    """)
    
    # Store in Neo4j
    with driver.session() as session:
        for entity in entities:
            session.run("""
                MERGE (e:Entity {name: $name})
                SET e.type = $type
            """, name=entity.text, type=entity.label_)
        
        for rel in relationships:
            session.run("""
                MATCH (a:Entity {name: $entity1})
                MATCH (b:Entity {name: $entity2})
                MERGE (a)-[r:RELATES_TO {type: $rel_type}]->(b)
            """, entity1=rel[0], rel_type=rel[1], entity2=rel[2])

Query Processing

async def process_query(query: str) -> str:
    workflow = create_workflow()
    
    initial_state = {
        "query": query,
        "entities": [],
        "graph_context": "",
        "response": ""
    }
    
    result = await workflow.ainvoke(initial_state)
    return result["response"]

Use Cases

1. Enterprise Knowledge Base

  • Employee expertise mapping
  • Project relationships
  • Technology stack tracking

2. Research Assistant

  • Paper citations and relationships
  • Author collaborations
  • Topic clustering

3. Customer Support

  • Product relationships
  • Issue resolution paths
  • Knowledge article connections

Performance Optimization

1. Graph Indexing

CREATE INDEX entity_name FOR (e:Entity) ON (e.name)
CREATE VECTOR INDEX document_embeddings FOR (d:Document) ON (d.embedding)

2. Query Caching

  • Cache frequent graph traversals
  • Reuse entity embeddings
  • Implement TTL-based invalidation

3. Batch Processing

  • Process multiple queries in parallel
  • Batch Neo4j operations
  • Use connection pooling

Results

In our implementation for enterprise knowledge management:

  • Query Accuracy: 92% vs 78% with traditional RAG
  • Response Time: < 2 seconds for complex queries
  • Graph Size: 1M+ nodes, 5M+ relationships
  • Multi-hop Reasoning: Up to 3 hops efficiently

Challenges & Solutions

Challenge 1: Graph Complexity

Solution: Implement graph pruning and relevance scoring

Challenge 2: Scaling

Solution: Use Neo4j sharding and read replicas

Challenge 3: Consistency

Solution: Implement versioning and transaction management

Tech Stack

  • LangGraph: Workflow orchestration
  • Neo4j: Knowledge graph database
  • LangChain: LLM integration
  • OpenAI GPT-4: Language model
  • FastAPI: REST API
  • Docker: Containerization

Conclusion

GraphRAG represents the next evolution in knowledge management systems. By combining the structured knowledge of graphs with the flexibility of LLMs, we can build systems that truly understand and reason about complex information.

Check out the full implementation on my GitHub or connect with me on LinkedIn.

Shahariar Hossen

Shahariar Hossen

Senior Full Stack Engineer with 6+ years of experience in building scalable systems. Specialist in Spring Boot, Microservices, and AI/ML.

Continue Reading