Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Spcl Graph of thoughts Score Operation

From Leeroopedia
Knowledge Sources
Domains Graph_Reasoning, Thought_Operations
Principles Principle:Spcl_Graph_of_thoughts_Thought_Scoring
Source File graph_of_thoughts/operations/operations.py, Lines 154-267
Last Updated 2026-02-14

Overview

The Score class is an operation that assigns numerical quality scores to thoughts from predecessor operations. It supports two scoring modes: programmatic (via a callable scoring function) and LLM-based (via the prompter and parser). It also supports two evaluation strategies: individual scoring (each thought scored independently) and combined scoring (all thoughts evaluated together).

Import

from graph_of_thoughts.operations import Score

Class Signature

class Score(Operation):
    operation_type = OperationType.score

    def __init__(
        self,
        num_samples: int = 1,
        combined_scoring: bool = False,
        scoring_function: Callable[
            [Union[List[Dict], Dict]], Union[List[float], float]
        ] = None,
    ) -> None: ...

    def get_thoughts(self) -> List[Thought]: ...

    def _execute(
        self, lm: AbstractLanguageModel, prompter: Prompter, parser: Parser, **kwargs
    ) -> None: ...

Constructor Parameters

Parameter Type Default Description
num_samples int 1 Number of LLM queries per scoring evaluation (only used for LLM-based scoring)
combined_scoring bool False If True, all thoughts are scored together; if False, each thought is scored individually
scoring_function Callable None Optional programmatic scoring function; if None, uses LLM-based scoring

Scoring Function Signature

The scoring_function callable has different signatures depending on the scoring mode:

  • Individual scoring: (Dict) -> float -- receives a single thought state, returns a single score
  • Combined scoring: (List[Dict]) -> List[float] -- receives a list of all thought states, returns a list of scores (one per thought)

I/O Behavior

Input:

  • Predecessor thoughts (list of Thought objects from upstream operations). At least one predecessor is required.

Output:

  • New Thought objects cloned from the predecessors, with the .score property set. Stored in self.thoughts.

The original predecessor thoughts are not mutated. The Score operation always creates new Thought instances via Thought.from_thought().

Execution Flow

Combined Scoring Path (combined_scoring=True)

  1. Collect all predecessor thoughts
  2. Extract all thought states into a list: [thought.state for thought in previous_thoughts]
  3. If scoring_function is provided:
    • Call scoring_function(previous_thoughts_states) to get a list of scores
  4. If scoring_function is None (LLM-based):
    • Construct a prompt via prompter.score_prompt(previous_thoughts_states)
    • Query the LLM via lm.query(prompt, num_responses=num_samples)
    • Parse scores via parser.parse_score_answer(previous_thoughts_states, responses)
  5. Create new Thought objects from each predecessor, assigning the corresponding score

Individual Scoring Path (combined_scoring=False)

  1. Collect all predecessor thoughts
  2. For each thought:
    1. If scoring_function is provided:
      • Call scoring_function(thought.state) to get a single score
    2. If scoring_function is None (LLM-based):
      • Construct a prompt via prompter.score_prompt([thought.state])
      • Query the LLM via lm.query(prompt, num_responses=num_samples)
      • Parse scores via parser.parse_score_answer([thought.state], responses) and take the first element
    3. Create a new Thought cloned from the original, with the score set

Key Implementation Details

Thought Cloning

Every scored thought is a clone of the original:

new_thought = Thought.from_thought(thought)
new_thought.score = score
self.thoughts.append(new_thought)

Setting .score automatically sets the .scored flag to True on the thought, which is required by downstream KeepBestN operations that assert all thoughts have been scored.

LLM Scoring Prompt Construction

For LLM-based scoring, the prompt is always constructed with a list of thought states, even in individual mode:

# Individual mode -- wraps single state in a list
prompt = prompter.score_prompt([thought.state])

# Combined mode -- passes all states
prompt = prompter.score_prompt(previous_thoughts_states)

This ensures a consistent interface for the Prompter regardless of scoring strategy.

Assertion on Predecessors

The operation asserts that it has at least one predecessor before executing:

assert len(self.predecessors) > 0, "Score operation needs at least one predecessor"

A Score operation with no predecessors has no thoughts to score and represents a graph construction error.

Instance Attributes

Attribute Type Description
operation_type OperationType Always OperationType.score
num_samples int Number of LLM queries per scoring evaluation
combined_scoring bool Whether to score all thoughts together
scoring_function Callable Programmatic scoring function (or None for LLM-based)
thoughts List[Thought] Scored thoughts (populated after execution)
id int Unique operation identifier (inherited from Operation)
predecessors List[Operation] Upstream operations (inherited from Operation)
successors List[Operation] Downstream operations (inherited from Operation)
executed bool Whether the operation has been executed (inherited from Operation)

Usage Example

from graph_of_thoughts.operations import Generate, Score, KeepBestN

# Programmatic scoring: count sorting errors
def count_errors(state):
    lst = state["current"]
    return sum(1 for i in range(len(lst) - 1) if lst[i] > lst[i + 1])

# Build a generate-score-keep pipeline
gen = Generate(num_branches_prompt=5, num_branches_response=1)
score = Score(scoring_function=count_errors)
keep = KeepBestN(n=1, higher_is_better=False)  # fewer errors = better

score.add_predecessor(gen)
keep.add_predecessor(score)

# LLM-based scoring with combined evaluation
score_llm = Score(num_samples=3, combined_scoring=True)
score_llm.add_predecessor(gen)

Related Pages

GitHub URL

graph_of_thoughts/operations/operations.py (Lines 154-267)

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment