Implementation:Spcl Graph of thoughts GroundTruth Operation

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Domains	Graph_Reasoning, Evaluation
Last Updated	2026-02-14
Implements	Principle:Spcl_Graph_of_thoughts_Ground_Truth_Evaluation

Overview

Implementation of the ground truth evaluation pattern that checks whether thoughts correctly solve the problem by comparing against a ground truth evaluator function.

Description

The GroundTruth class is a concrete operation in the Graph of Thoughts framework that evaluates whether thought states represent correct solutions to the problem. It is implemented as a subclass of Operation with operation type OperationType.ground_truth_evaluator.

The execution flow is:

Assert at least one predecessor exists
Retrieve all predecessor thoughts via get_previous_thoughts()
For each predecessor thought:
1. Clone the thought via Thought.from_thought()
2. Call self.ground_truth_evaluator(new_thought.state) within a try/except block
3. Set new_thought.solved to the evaluator's result (or False if an exception occurs)
4. Append the new thought to the output list
Log the number of evaluated thoughts and how many were solved

This is a pure evaluation operation that does not interact with the language model. The evaluator function is domain-specific and provided at initialization.

Usage

from graph_of_thoughts.operations import GroundTruth

# Create a GroundTruth operation with a domain-specific evaluator
def check_sorted(state):
    """Check if the sorted output matches ground truth."""
    return state.get("current") == state.get("ground_truth")

gt = GroundTruth(ground_truth_evaluator=check_sorted)

# Wire as the final operation in the pipeline
gt.add_predecessor(keep_best_op)

Code Reference

Source Location

File: graph_of_thoughts/operations/operations.py, Lines 776-837
Import: from graph_of_thoughts.operations import GroundTruth

Class Signature

class GroundTruth(Operation):
    operation_type: OperationType = OperationType.ground_truth_evaluator

    def __init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None:
        """
        Initializes a new GroundTruth operation.

        :param ground_truth_evaluator: A function to evaluate if a thought solves the problem.
        :type ground_truth_evaluator: A function that takes a thought state and returns a boolean.
        """

Key Methods

__init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None -- Initializes the operation with the evaluator function and an empty thoughts list.
get_thoughts(self) -> List[Thought] -- Returns the list of evaluated thoughts (with solved flag set) after execution.
_execute(self, lm, prompter, parser, **kwargs) -> None -- Core execution logic: clones predecessor thoughts, evaluates each against ground truth, sets the solved flag.

Internal State

self.ground_truth_evaluator: Callable[[Dict], bool] -- The evaluator function provided at initialization.
self.thoughts: List[Thought] -- Stores the evaluated thoughts after execution.

I/O Contract

Input	Output	Side Effects
Predecessor thoughts from one or more predecessor operations. Each thought carries a state dictionary to be evaluated against ground truth.	Same thoughts with solved flag set -- new cloned `Thought` objects with `solved` set to `True` or `False` based on the evaluator function. The `compared_to_ground_truth` flag is automatically set to `True` by the `Thought.solved` property setter.	No language model interaction. Logs the number of evaluated and solved thoughts at INFO level.

Evaluation logic with exception handling:

for thought in previous_thoughts:
    new_thought = Thought.from_thought(thought)
    try:
        new_thought.solved = self.ground_truth_evaluator(new_thought.state)
    except:
        new_thought.solved = False
    self.thoughts.append(new_thought)

Assertions:

At least one predecessor must exist (len(self.predecessors) >= 1)

Thought flags set:

thought.solved -- True if evaluator returns True, False otherwise (including on exception)
thought.compared_to_ground_truth -- Always set to True (automatically by the solved setter)

Usage Examples

Sorting: Evaluate Final Sorted List

from graph_of_thoughts.operations import KeepBestN, GroundTruth

def sorting_ground_truth(state):
    """Check if current sorted list matches the expected ground truth."""
    current = state.get("current", [])
    ground_truth = state.get("ground_truth", [])
    return current == ground_truth

keep_best = KeepBestN(n=1, higher_is_better=False)
gt = GroundTruth(ground_truth_evaluator=sorting_ground_truth)
gt.add_predecessor(keep_best)

Set Intersection: Verify Result

from graph_of_thoughts.operations import GroundTruth

def intersection_evaluator(state):
    """Check if computed intersection matches expected."""
    computed = set(state.get("result", []))
    expected = set(state.get("ground_truth", []))
    return computed == expected

gt = GroundTruth(ground_truth_evaluator=intersection_evaluator)
gt.add_predecessor(final_keep_best)

Batch Evaluation and Reporting

# After execution, check how many problems were solved
gt_thoughts = gt.get_thoughts()
solved_count = sum(1 for t in gt_thoughts if t.solved)
total_count = len(gt_thoughts)
print(f"Solved: {solved_count}/{total_count}")

Related Pages

Principle:Spcl_Graph_of_thoughts_Ground_Truth_Evaluation - The principle this implementation realizes
Implementation:Spcl_Graph_of_thoughts_KeepBestN_Operation - KeepBestN is typically the last operation before GroundTruth
Workflow:Spcl_Graph_of_thoughts_GoT_Sorting_Pipeline - Sorting benchmark pipeline with GroundTruth evaluation
Workflow:Spcl_Graph_of_thoughts_GoT_Keyword_Counting_Pipeline - Keyword counting pipeline with GroundTruth evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment