Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Spcl Graph of thoughts GroundTruth Operation

From Leeroopedia
Knowledge Sources
Domains Graph_Reasoning, Evaluation
Last Updated 2026-02-14
Implements Principle:Spcl_Graph_of_thoughts_Ground_Truth_Evaluation

Overview

Implementation of the ground truth evaluation pattern that checks whether thoughts correctly solve the problem by comparing against a ground truth evaluator function.

Description

The GroundTruth class is a concrete operation in the Graph of Thoughts framework that evaluates whether thought states represent correct solutions to the problem. It is implemented as a subclass of Operation with operation type OperationType.ground_truth_evaluator.

The execution flow is:

  1. Assert at least one predecessor exists
  2. Retrieve all predecessor thoughts via get_previous_thoughts()
  3. For each predecessor thought:
    1. Clone the thought via Thought.from_thought()
    2. Call self.ground_truth_evaluator(new_thought.state) within a try/except block
    3. Set new_thought.solved to the evaluator's result (or False if an exception occurs)
    4. Append the new thought to the output list
  4. Log the number of evaluated thoughts and how many were solved

This is a pure evaluation operation that does not interact with the language model. The evaluator function is domain-specific and provided at initialization.

Usage

from graph_of_thoughts.operations import GroundTruth

# Create a GroundTruth operation with a domain-specific evaluator
def check_sorted(state):
    """Check if the sorted output matches ground truth."""
    return state.get("current") == state.get("ground_truth")

gt = GroundTruth(ground_truth_evaluator=check_sorted)

# Wire as the final operation in the pipeline
gt.add_predecessor(keep_best_op)

Code Reference

Source Location

  • File: graph_of_thoughts/operations/operations.py, Lines 776-837
  • Import: from graph_of_thoughts.operations import GroundTruth

Class Signature

class GroundTruth(Operation):
    operation_type: OperationType = OperationType.ground_truth_evaluator

    def __init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None:
        """
        Initializes a new GroundTruth operation.

        :param ground_truth_evaluator: A function to evaluate if a thought solves the problem.
        :type ground_truth_evaluator: A function that takes a thought state and returns a boolean.
        """

Key Methods

  • __init__(self, ground_truth_evaluator: Callable[[Dict], bool]) -> None -- Initializes the operation with the evaluator function and an empty thoughts list.
  • get_thoughts(self) -> List[Thought] -- Returns the list of evaluated thoughts (with solved flag set) after execution.
  • _execute(self, lm, prompter, parser, **kwargs) -> None -- Core execution logic: clones predecessor thoughts, evaluates each against ground truth, sets the solved flag.

Internal State

  • self.ground_truth_evaluator: Callable[[Dict], bool] -- The evaluator function provided at initialization.
  • self.thoughts: List[Thought] -- Stores the evaluated thoughts after execution.

I/O Contract

Input Output Side Effects
Predecessor thoughts from one or more predecessor operations. Each thought carries a state dictionary to be evaluated against ground truth. Same thoughts with solved flag set -- new cloned Thought objects with solved set to True or False based on the evaluator function. The compared_to_ground_truth flag is automatically set to True by the Thought.solved property setter. No language model interaction. Logs the number of evaluated and solved thoughts at INFO level.

Evaluation logic with exception handling:

for thought in previous_thoughts:
    new_thought = Thought.from_thought(thought)
    try:
        new_thought.solved = self.ground_truth_evaluator(new_thought.state)
    except:
        new_thought.solved = False
    self.thoughts.append(new_thought)

Assertions:

  • At least one predecessor must exist (len(self.predecessors) >= 1)

Thought flags set:

  • thought.solved -- True if evaluator returns True, False otherwise (including on exception)
  • thought.compared_to_ground_truth -- Always set to True (automatically by the solved setter)

Usage Examples

Sorting: Evaluate Final Sorted List

from graph_of_thoughts.operations import KeepBestN, GroundTruth

def sorting_ground_truth(state):
    """Check if current sorted list matches the expected ground truth."""
    current = state.get("current", [])
    ground_truth = state.get("ground_truth", [])
    return current == ground_truth

keep_best = KeepBestN(n=1, higher_is_better=False)
gt = GroundTruth(ground_truth_evaluator=sorting_ground_truth)
gt.add_predecessor(keep_best)

Set Intersection: Verify Result

from graph_of_thoughts.operations import GroundTruth

def intersection_evaluator(state):
    """Check if computed intersection matches expected."""
    computed = set(state.get("result", []))
    expected = set(state.get("ground_truth", []))
    return computed == expected

gt = GroundTruth(ground_truth_evaluator=intersection_evaluator)
gt.add_predecessor(final_keep_best)

Batch Evaluation and Reporting

# After execution, check how many problems were solved
gt_thoughts = gt.get_thoughts()
solved_count = sum(1 for t in gt_thoughts if t.solved)
total_count = len(gt_thoughts)
print(f"Solved: {solved_count}/{total_count}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment