Implementation:Spcl Graph of thoughts Score Operation

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Domains	Graph_Reasoning, Thought_Operations
Principles	Principle:Spcl_Graph_of_thoughts_Thought_Scoring
Source File	`graph_of_thoughts/operations/operations.py`, Lines 154-267
Last Updated	2026-02-14

Overview

The Score class is an operation that assigns numerical quality scores to thoughts from predecessor operations. It supports two scoring modes: programmatic (via a callable scoring function) and LLM-based (via the prompter and parser). It also supports two evaluation strategies: individual scoring (each thought scored independently) and combined scoring (all thoughts evaluated together).

Import

from graph_of_thoughts.operations import Score

Class Signature

class Score(Operation):
    operation_type = OperationType.score

    def __init__(
        self,
        num_samples: int = 1,
        combined_scoring: bool = False,
        scoring_function: Callable[
            [Union[List[Dict], Dict]], Union[List[float], float]
        ] = None,
    ) -> None: ...

    def get_thoughts(self) -> List[Thought]: ...

    def _execute(
        self, lm: AbstractLanguageModel, prompter: Prompter, parser: Parser, **kwargs
    ) -> None: ...

Constructor Parameters

Parameter	Type	Default	Description
`num_samples`	`int`	`1`	Number of LLM queries per scoring evaluation (only used for LLM-based scoring)
`combined_scoring`	`bool`	`False`	If `True`, all thoughts are scored together; if `False`, each thought is scored individually
`scoring_function`	`Callable`	`None`	Optional programmatic scoring function; if `None`, uses LLM-based scoring

Scoring Function Signature

The scoring_function callable has different signatures depending on the scoring mode:

Individual scoring: (Dict) -> float -- receives a single thought state, returns a single score
Combined scoring: (List[Dict]) -> List[float] -- receives a list of all thought states, returns a list of scores (one per thought)

I/O Behavior

Input:

Predecessor thoughts (list of Thought objects from upstream operations). At least one predecessor is required.

Output:

New Thought objects cloned from the predecessors, with the .score property set. Stored in self.thoughts.

The original predecessor thoughts are not mutated. The Score operation always creates new Thought instances via Thought.from_thought().

Execution Flow

Combined Scoring Path (combined_scoring=True)

Collect all predecessor thoughts
Extract all thought states into a list: [thought.state for thought in previous_thoughts]
If scoring_function is provided:
- Call scoring_function(previous_thoughts_states) to get a list of scores
If scoring_function is None (LLM-based):
- Construct a prompt via prompter.score_prompt(previous_thoughts_states)
- Query the LLM via lm.query(prompt, num_responses=num_samples)
- Parse scores via parser.parse_score_answer(previous_thoughts_states, responses)
Create new Thought objects from each predecessor, assigning the corresponding score

Individual Scoring Path (combined_scoring=False)

Collect all predecessor thoughts
For each thought:
1. If scoring_function is provided:
  - Call scoring_function(thought.state) to get a single score
2. If scoring_function is None (LLM-based):
  - Construct a prompt via prompter.score_prompt([thought.state])
  - Query the LLM via lm.query(prompt, num_responses=num_samples)
  - Parse scores via parser.parse_score_answer([thought.state], responses) and take the first element
3. Create a new Thought cloned from the original, with the score set

Key Implementation Details

Thought Cloning

Every scored thought is a clone of the original:

new_thought = Thought.from_thought(thought)
new_thought.score = score
self.thoughts.append(new_thought)

Setting .score automatically sets the .scored flag to True on the thought, which is required by downstream KeepBestN operations that assert all thoughts have been scored.

LLM Scoring Prompt Construction

For LLM-based scoring, the prompt is always constructed with a list of thought states, even in individual mode:

# Individual mode -- wraps single state in a list
prompt = prompter.score_prompt([thought.state])

# Combined mode -- passes all states
prompt = prompter.score_prompt(previous_thoughts_states)

This ensures a consistent interface for the Prompter regardless of scoring strategy.

Assertion on Predecessors

The operation asserts that it has at least one predecessor before executing:

assert len(self.predecessors) > 0, "Score operation needs at least one predecessor"

A Score operation with no predecessors has no thoughts to score and represents a graph construction error.

Instance Attributes

Attribute	Type	Description
`operation_type`	`OperationType`	Always `OperationType.score`
`num_samples`	`int`	Number of LLM queries per scoring evaluation
`combined_scoring`	`bool`	Whether to score all thoughts together
`scoring_function`	`Callable`	Programmatic scoring function (or `None` for LLM-based)
`thoughts`	`List[Thought]`	Scored thoughts (populated after execution)
`id`	`int`	Unique operation identifier (inherited from `Operation`)
`predecessors`	`List[Operation]`	Upstream operations (inherited from `Operation`)
`successors`	`List[Operation]`	Downstream operations (inherited from `Operation`)
`executed`	`bool`	Whether the operation has been executed (inherited from `Operation`)

Usage Example

from graph_of_thoughts.operations import Generate, Score, KeepBestN

# Programmatic scoring: count sorting errors
def count_errors(state):
    lst = state["current"]
    return sum(1 for i in range(len(lst) - 1) if lst[i] > lst[i + 1])

# Build a generate-score-keep pipeline
gen = Generate(num_branches_prompt=5, num_branches_response=1)
score = Score(scoring_function=count_errors)
keep = KeepBestN(n=1, higher_is_better=False)  # fewer errors = better

score.add_predecessor(gen)
keep.add_predecessor(score)

# LLM-based scoring with combined evaluation
score_llm = Score(num_samples=3, combined_scoring=True)
score_llm.add_predecessor(gen)

Related Pages

GitHub URL

graph_of_thoughts/operations/operations.py (Lines 154-267)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment