Implementation:Spcl Graph of thoughts Score Operation
| Knowledge Sources | |
|---|---|
| Domains | Graph_Reasoning, Thought_Operations |
| Principles | Principle:Spcl_Graph_of_thoughts_Thought_Scoring |
| Source File | graph_of_thoughts/operations/operations.py, Lines 154-267
|
| Last Updated | 2026-02-14 |
Overview
The Score class is an operation that assigns numerical quality scores to thoughts from predecessor operations. It supports two scoring modes: programmatic (via a callable scoring function) and LLM-based (via the prompter and parser). It also supports two evaluation strategies: individual scoring (each thought scored independently) and combined scoring (all thoughts evaluated together).
Import
from graph_of_thoughts.operations import Score
Class Signature
class Score(Operation):
operation_type = OperationType.score
def __init__(
self,
num_samples: int = 1,
combined_scoring: bool = False,
scoring_function: Callable[
[Union[List[Dict], Dict]], Union[List[float], float]
] = None,
) -> None: ...
def get_thoughts(self) -> List[Thought]: ...
def _execute(
self, lm: AbstractLanguageModel, prompter: Prompter, parser: Parser, **kwargs
) -> None: ...
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
num_samples |
int |
1 |
Number of LLM queries per scoring evaluation (only used for LLM-based scoring) |
combined_scoring |
bool |
False |
If True, all thoughts are scored together; if False, each thought is scored individually
|
scoring_function |
Callable |
None |
Optional programmatic scoring function; if None, uses LLM-based scoring
|
Scoring Function Signature
The scoring_function callable has different signatures depending on the scoring mode:
- Individual scoring:
(Dict) -> float-- receives a single thought state, returns a single score - Combined scoring:
(List[Dict]) -> List[float]-- receives a list of all thought states, returns a list of scores (one per thought)
I/O Behavior
Input:
- Predecessor thoughts (list of
Thoughtobjects from upstream operations). At least one predecessor is required.
Output:
- New
Thoughtobjects cloned from the predecessors, with the.scoreproperty set. Stored inself.thoughts.
The original predecessor thoughts are not mutated. The Score operation always creates new Thought instances via Thought.from_thought().
Execution Flow
Combined Scoring Path (combined_scoring=True)
- Collect all predecessor thoughts
- Extract all thought states into a list:
[thought.state for thought in previous_thoughts] - If scoring_function is provided:
- Call
scoring_function(previous_thoughts_states)to get a list of scores
- Call
- If scoring_function is None (LLM-based):
- Construct a prompt via
prompter.score_prompt(previous_thoughts_states) - Query the LLM via
lm.query(prompt, num_responses=num_samples) - Parse scores via
parser.parse_score_answer(previous_thoughts_states, responses)
- Construct a prompt via
- Create new Thought objects from each predecessor, assigning the corresponding score
Individual Scoring Path (combined_scoring=False)
- Collect all predecessor thoughts
- For each thought:
- If scoring_function is provided:
- Call
scoring_function(thought.state)to get a single score
- Call
- If scoring_function is None (LLM-based):
- Construct a prompt via
prompter.score_prompt([thought.state]) - Query the LLM via
lm.query(prompt, num_responses=num_samples) - Parse scores via
parser.parse_score_answer([thought.state], responses)and take the first element
- Construct a prompt via
- Create a new Thought cloned from the original, with the score set
- If scoring_function is provided:
Key Implementation Details
Thought Cloning
Every scored thought is a clone of the original:
new_thought = Thought.from_thought(thought)
new_thought.score = score
self.thoughts.append(new_thought)
Setting .score automatically sets the .scored flag to True on the thought, which is required by downstream KeepBestN operations that assert all thoughts have been scored.
LLM Scoring Prompt Construction
For LLM-based scoring, the prompt is always constructed with a list of thought states, even in individual mode:
# Individual mode -- wraps single state in a list
prompt = prompter.score_prompt([thought.state])
# Combined mode -- passes all states
prompt = prompter.score_prompt(previous_thoughts_states)
This ensures a consistent interface for the Prompter regardless of scoring strategy.
Assertion on Predecessors
The operation asserts that it has at least one predecessor before executing:
assert len(self.predecessors) > 0, "Score operation needs at least one predecessor"
A Score operation with no predecessors has no thoughts to score and represents a graph construction error.
Instance Attributes
| Attribute | Type | Description |
|---|---|---|
operation_type |
OperationType |
Always OperationType.score
|
num_samples |
int |
Number of LLM queries per scoring evaluation |
combined_scoring |
bool |
Whether to score all thoughts together |
scoring_function |
Callable |
Programmatic scoring function (or None for LLM-based)
|
thoughts |
List[Thought] |
Scored thoughts (populated after execution) |
id |
int |
Unique operation identifier (inherited from Operation)
|
predecessors |
List[Operation] |
Upstream operations (inherited from Operation)
|
successors |
List[Operation] |
Downstream operations (inherited from Operation)
|
executed |
bool |
Whether the operation has been executed (inherited from Operation)
|
Usage Example
from graph_of_thoughts.operations import Generate, Score, KeepBestN
# Programmatic scoring: count sorting errors
def count_errors(state):
lst = state["current"]
return sum(1 for i in range(len(lst) - 1) if lst[i] > lst[i + 1])
# Build a generate-score-keep pipeline
gen = Generate(num_branches_prompt=5, num_branches_response=1)
score = Score(scoring_function=count_errors)
keep = KeepBestN(n=1, higher_is_better=False) # fewer errors = better
score.add_predecessor(gen)
keep.add_predecessor(score)
# LLM-based scoring with combined evaluation
score_llm = Score(num_samples=3, combined_scoring=True)
score_llm.add_predecessor(gen)
Related Pages
- Principle:Spcl_Graph_of_thoughts_Thought_Scoring
- Environment:Spcl_Graph_of_thoughts_Python_3_8_Runtime
- Heuristic:Spcl_Graph_of_thoughts_Scoring_With_Error_Counting
GitHub URL
graph_of_thoughts/operations/operations.py (Lines 154-267)