| Knowledge Sources |
Domains |
Last Updated
|
| explodinggradients/ragas |
LLM Evaluation, Test Data Generation, Knowledge Graphs |
2026-02-10
|
Overview
Description
The KnowledgeGraph class, together with its supporting Node, Relationship, and NodeType types, provides the core data structure for representing documents as a structured graph in the Ragas test data generation pipeline. The KnowledgeGraph is a Python dataclass that holds lists of nodes and relationships. Nodes are Pydantic models with UUID-based identity, typed classification (DOCUMENT, CHUNK, UNKNOWN), and an extensible property dictionary. Relationships are Pydantic models connecting a source node to a target node with a type label, optional bidirectionality, and their own property dictionary.
Usage
The KnowledgeGraph is created automatically when using TestsetGenerator.generate_with_langchain_docs() or TestsetGenerator.generate_with_llamaindex_docs(). It can also be constructed manually for custom workflows. After construction, the graph is enriched via transforms and then consumed by query synthesizers to generate test questions.
Code Reference
Source Location
| Component |
File |
Lines
|
NodeType |
src/ragas/testset/graph.py |
L23-32
|
Node |
src/ragas/testset/graph.py |
L35-89
|
Relationship |
src/ragas/testset/graph.py |
L92-142
|
KnowledgeGraph |
src/ragas/testset/graph.py |
L145-738
|
Signature
class NodeType(str, Enum):
UNKNOWN = ""
DOCUMENT = "document"
CHUNK = "chunk"
class Node(BaseModel):
id: uuid.UUID = Field(default_factory=uuid.uuid4)
properties: dict = Field(default_factory=dict)
type: NodeType = NodeType.UNKNOWN
class Relationship(BaseModel):
id: uuid.UUID = Field(default_factory=uuid.uuid4)
type: str
source: Node
target: Node
bidirectional: bool = False
properties: dict = Field(default_factory=dict)
@dataclass
class KnowledgeGraph:
nodes: List[Node] = field(default_factory=list)
relationships: List[Relationship] = field(default_factory=list)
Import
from ragas.testset.graph import KnowledgeGraph, Node, Relationship, NodeType
Key Methods
| Method |
Signature |
Description
|
add |
add(item: Union[Node, Relationship]) -> None |
Adds a node or relationship to the graph. Raises ValueError for invalid types.
|
save |
save(path: Union[str, Path]) -> None |
Serializes the graph to a JSON file at the given path using UTF-8 encoding.
|
load |
load(path: Union[str, Path]) -> KnowledgeGraph |
Class method that deserializes a graph from a JSON file, reconstructing node references in relationships.
|
get_node_by_id |
get_node_by_id(node_id: Union[UUID, str]) -> Optional[Node] |
Retrieves a node by its UUID.
|
find_indirect_clusters |
find_indirect_clusters(relationship_condition=lambda _: True, depth_limit=3) -> List[Set[Node]] |
Finds clusters of indirectly connected nodes using the Leiden community detection algorithm.
|
find_n_indirect_clusters |
find_n_indirect_clusters(n, relationship_condition=lambda _: True, depth_limit=3) -> List[Set[Node]] |
Returns up to n indirect clusters using DFS-based path exploration with diversity optimization.
|
remove_node |
remove_node(node: Node, inplace: bool = True) -> Optional[KnowledgeGraph] |
Removes a node and its associated relationships from the graph.
|
find_two_nodes_single_rel |
find_two_nodes_single_rel(relationship_condition=lambda _: True) -> List[Tuple[Node, Relationship, Node]] |
Finds (NodeA, Relationship, NodeB) triples based on a relationship condition.
|
I/O Contract
Node
| Parameter |
Type |
Default |
Description
|
id |
uuid.UUID |
auto-generated |
Unique identifier for the node
|
properties |
dict |
{} |
Extensible key-value property store (keys are case-insensitive)
|
type |
NodeType |
NodeType.UNKNOWN |
Classification of the node (DOCUMENT, CHUNK, UNKNOWN)
|
Relationship
| Parameter |
Type |
Default |
Description
|
id |
uuid.UUID |
auto-generated |
Unique identifier for the relationship
|
type |
str |
(required) |
The type label of the relationship (e.g., "child", "similar")
|
source |
Node |
(required) |
The source node
|
target |
Node |
(required) |
The target node
|
bidirectional |
bool |
False |
Whether the relationship is symmetric
|
properties |
dict |
{} |
Extensible key-value property store
|
KnowledgeGraph.save / KnowledgeGraph.load
| Direction |
Type |
Description
|
| Input (save) |
Union[str, Path] |
File system path for the output JSON file
|
| Output (save) |
JSON file |
Serialized graph with nodes and relationships arrays
|
| Input (load) |
Union[str, Path] |
File system path to an existing JSON file
|
| Output (load) |
KnowledgeGraph |
Reconstructed graph with fully resolved node references
|
Usage Examples
Creating a Knowledge Graph Manually
from ragas.testset.graph import KnowledgeGraph, Node, Relationship, NodeType
# Create nodes
doc_node = Node(
type=NodeType.DOCUMENT,
properties={
"page_content": "Machine learning is a subset of artificial intelligence.",
"document_metadata": {"source": "ml_intro.pdf"},
},
)
chunk_node = Node(
type=NodeType.CHUNK,
properties={
"page_content": "Supervised learning uses labeled data.",
"document_metadata": {"source": "ml_intro.pdf", "chunk_id": 0},
},
)
# Create a relationship
rel = Relationship(
type="child",
source=doc_node,
target=chunk_node,
)
# Build the graph
kg = KnowledgeGraph()
kg.add(doc_node)
kg.add(chunk_node)
kg.add(rel)
print(kg)
# KnowledgeGraph(nodes: 2, relationships: 1)
Saving and Loading a Knowledge Graph
from ragas.testset.graph import KnowledgeGraph
# Save to disk
kg.save("my_knowledge_graph.json")
# Load from disk
loaded_kg = KnowledgeGraph.load("my_knowledge_graph.json")
print(loaded_kg)
# KnowledgeGraph(nodes: 2, relationships: 1)
Retrieving a Node by ID
node = kg.get_node_by_id(doc_node.id)
print(node.get_property("page_content"))
# "Machine learning is a subset of artificial intelligence."
Related Pages