Principle:Explodinggradients Ragas Knowledge Graph Construction
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Test Data Generation, Knowledge Graphs | 2026-02-10 |
Overview
Description
Knowledge Graph Construction is the foundational principle behind Ragas test data generation. It defines how source documents are decomposed into a structured graph of typed nodes (DOCUMENT, CHUNK) connected by directed relationships, enabling downstream processes such as multi-hop reasoning, diverse query synthesis, and thematic clustering.
Rather than treating documents as flat text, Ragas converts each document into a graph representation where individual content units become nodes with properties (page content, metadata, summaries, embeddings) and their structural and semantic connections become edges. This graph-centric approach enables the system to reason about relationships between pieces of information across an entire corpus, which is essential for generating realistic and diverse evaluation questions.
Usage
Knowledge graph construction is the first step in the Ragas test data generation pipeline. When a user provides source documents (via LangChain, LlamaIndex, or raw text), the system:
- Converts each document into a Node of type
DOCUMENTorCHUNK. - Assigns a unique UUID identifier and stores the document content and metadata as node properties.
- Applies a series of transforms (chunking, summarization, embedding, relationship extraction) to enrich the graph.
- Connects nodes via typed Relationship objects (e.g.,
childfor parent-child hierarchy, or custom semantic relationships).
The resulting KnowledgeGraph serves as the single source of truth for all subsequent generation steps: persona generation, scenario creation, and query synthesis.
Theoretical Basis
The knowledge graph construction approach in Ragas draws from several theoretical foundations:
Graph-Based Document Representation: Documents are naturally hierarchical. A document contains sections, which contain paragraphs, which contain sentences. By representing this hierarchy as a directed graph with typed nodes and relationships, the system preserves structural information that flat-text approaches lose. The NodeType enumeration (DOCUMENT, CHUNK, UNKNOWN) captures these structural levels.
Typed Node Properties: Each node carries a property dictionary that can hold arbitrary key-value pairs. This design allows the graph to be progressively enriched. Initially, a node may only carry page_content and document_metadata. Through transforms, it acquires summary, summary_embedding, and other derived properties. This layered enrichment is critical for enabling diverse query types.
Directed Relationships with Bidirectionality: Relationships between nodes are directional by default (source to target), supporting parent-child hierarchies and causal links. The optional bidirectional flag enables symmetric connections such as "related-to" or "similar-to" relationships, which are essential for community detection algorithms used in multi-hop scenario generation.
UUID-Based Identity: Each node and relationship is assigned a UUID, ensuring globally unique identification. This enables reliable serialization, deserialization, and cross-referencing within the graph even when nodes have identical content.
Serialization for Persistence: The knowledge graph supports JSON serialization and deserialization, allowing graphs to be saved, shared, and reloaded. This enables iterative workflows where a user constructs and enriches a graph once, then generates multiple test sets from the same graph.
Related Pages
- Implementation:Explodinggradients_Ragas_KnowledgeGraph_Class
- Principle:Explodinggradients_Ragas_Knowledge_Graph_Enrichment -- enrichment of the constructed graph with summaries, embeddings, and clusters
- Principle:Explodinggradients_Ragas_Document_Loading -- loading source documents into the format consumed by graph construction
- Principle:Explodinggradients_Ragas_Test_Query_Synthesis -- synthesizing queries from the constructed knowledge graph