Implementation:Spcl Graph of thoughts GroundTruth Operation
| Knowledge Sources | |
|---|---|
| Domains | Graph_Reasoning, Evaluation |
| Last Updated | 2026-02-14 |
| Implements | Principle:Spcl_Graph_of_thoughts_Ground_Truth_Evaluation |
Overview
Implementation of the ground truth evaluation pattern that checks whether thoughts correctly solve the problem by comparing against a ground truth evaluator function.
Description
The GroundTruth class is a concrete operation in the Graph of Thoughts framework that evaluates whether thought states represent correct solutions to the problem. It is implemented as a subclass of Operation with operation type OperationType.ground_truth_evaluator.
The execution flow is:
- Assert at least one predecessor exists
- Retrieve all predecessor thoughts via
get_previous_thoughts() - For each predecessor thought:
- Clone the thought via
Thought.from_thought() - Call
self.ground_truth_evaluator(new_thought.state)within a try/except block - Set
new_thought.solvedto the evaluator's result (orFalseif an exception occurs) - Append the new thought to the output list
- Clone the thought via
- Log the number of evaluated thoughts and how many were solved
This is a pure evaluation operation that does not interact with the language model. The evaluator function is domain-specific and provided at initialization.
Usage
from graph_of_thoughts.operations import GroundTruth
# Create a GroundTruth operation with a domain-specific evaluator
def check_sorted(state):
"""Check if the sorted output matches ground truth."""
return state.get("current") == state.get("ground_truth")
gt = GroundTruth(ground_truth_evaluator=check_sorted)
# Wire as the final operation in the pipeline
gt.add_predecessor(keep_best_op)
Code Reference
Source Location
- File:
graph_of_thoughts/operations/operations.py, Lines 776-837 - Import:
from graph_of_thoughts.operations import GroundTruth
Class Signature
class GroundTruth(Operation):
operation_type: OperationType = OperationType.ground_truth_evaluator
def __init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None:
"""
Initializes a new GroundTruth operation.
:param ground_truth_evaluator: A function to evaluate if a thought solves the problem.
:type ground_truth_evaluator: A function that takes a thought state and returns a boolean.
"""
Key Methods
__init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None-- Initializes the operation with the evaluator function and an emptythoughtslist.get_thoughts(self) -> List[Thought]-- Returns the list of evaluated thoughts (withsolvedflag set) after execution._execute(self, lm, prompter, parser, **kwargs) -> None-- Core execution logic: clones predecessor thoughts, evaluates each against ground truth, sets thesolvedflag.
Internal State
self.ground_truth_evaluator: Callable[[Dict], bool]-- The evaluator function provided at initialization.self.thoughts: List[Thought]-- Stores the evaluated thoughts after execution.
I/O Contract
| Input | Output | Side Effects |
|---|---|---|
| Predecessor thoughts from one or more predecessor operations. Each thought carries a state dictionary to be evaluated against ground truth. | Same thoughts with solved flag set -- new cloned Thought objects with solved set to True or False based on the evaluator function. The compared_to_ground_truth flag is automatically set to True by the Thought.solved property setter.
|
No language model interaction. Logs the number of evaluated and solved thoughts at INFO level. |
Evaluation logic with exception handling:
for thought in previous_thoughts:
new_thought = Thought.from_thought(thought)
try:
new_thought.solved = self.ground_truth_evaluator(new_thought.state)
except:
new_thought.solved = False
self.thoughts.append(new_thought)
Assertions:
- At least one predecessor must exist (
len(self.predecessors) >= 1)
Thought flags set:
thought.solved--Trueif evaluator returnsTrue,Falseotherwise (including on exception)thought.compared_to_ground_truth-- Always set toTrue(automatically by thesolvedsetter)
Usage Examples
Sorting: Evaluate Final Sorted List
from graph_of_thoughts.operations import KeepBestN, GroundTruth
def sorting_ground_truth(state):
"""Check if current sorted list matches the expected ground truth."""
current = state.get("current", [])
ground_truth = state.get("ground_truth", [])
return current == ground_truth
keep_best = KeepBestN(n=1, higher_is_better=False)
gt = GroundTruth(ground_truth_evaluator=sorting_ground_truth)
gt.add_predecessor(keep_best)
Set Intersection: Verify Result
from graph_of_thoughts.operations import GroundTruth
def intersection_evaluator(state):
"""Check if computed intersection matches expected."""
computed = set(state.get("result", []))
expected = set(state.get("ground_truth", []))
return computed == expected
gt = GroundTruth(ground_truth_evaluator=intersection_evaluator)
gt.add_predecessor(final_keep_best)
Batch Evaluation and Reporting
# After execution, check how many problems were solved
gt_thoughts = gt.get_thoughts()
solved_count = sum(1 for t in gt_thoughts if t.solved)
total_count = len(gt_thoughts)
print(f"Solved: {solved_count}/{total_count}")
Related Pages
- Principle:Spcl_Graph_of_thoughts_Ground_Truth_Evaluation - The principle this implementation realizes
- Implementation:Spcl_Graph_of_thoughts_KeepBestN_Operation - KeepBestN is typically the last operation before GroundTruth
- Workflow:Spcl_Graph_of_thoughts_GoT_Sorting_Pipeline - Sorting benchmark pipeline with GroundTruth evaluation
- Workflow:Spcl_Graph_of_thoughts_GoT_Keyword_Counting_Pipeline - Keyword counting pipeline with GroundTruth evaluation