Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Princeton nlp Tree of thought llm Thought Evaluation

From Leeroopedia
Knowledge Sources
Domains LLM_Reasoning, Search_Algorithms, NLP
Last Updated 2026-02-14 03:30 GMT

Overview

The mechanism by which an LLM assigns quality scores to candidate thoughts, enabling informed selection of the most promising reasoning paths.

Description

Thought Evaluation is the second phase of each BFS iteration in the Tree of Thoughts framework. After candidate thoughts are generated, the LLM evaluates their quality to guide the selection process. The framework supports two evaluation strategies:

  • Value: The LLM independently assesses each candidate by reasoning about whether it leads toward the goal, producing a categorical judgment (e.g., sure/likely/impossible) that is mapped to a numeric score. Values are cached to avoid redundant LLM calls for duplicate candidates.
  • Vote: The LLM sees all candidates simultaneously and votes for the best one. Multiple voting rounds accumulate a vote count per candidate. This is suited for tasks where relative comparison is more reliable than absolute assessment (e.g., creative writing quality).

Usage

Use the value strategy when individual candidates can be meaningfully assessed in isolation (e.g., "Can these numbers reach 24?"). Use the vote strategy when quality is best judged by comparison (e.g., "Which passage is more coherent?").

Theoretical Basis

Value evaluation:

For each candidate yi, the LLM produces a categorical assessment that is mapped to a numeric score:

# Abstract: value-based scoring
for candidate in candidates:
    prompt = task.value_prompt_wrap(input, candidate)
    outputs = llm(prompt, n=k)  # k evaluation samples
    score = task.value_outputs_unwrap(input, candidate, outputs)
    # Maps categories to numbers, e.g.:
    # {'impossible': 0.001, 'likely': 1, 'sure': 20}

Vote evaluation:

All candidates are presented together and the LLM selects the best:

# Abstract: vote-based scoring
prompt = task.vote_prompt_wrap(input, all_candidates)
outputs = llm(prompt, n=k)
votes = task.vote_outputs_unwrap(outputs, len(candidates))
# votes[i] = number of times candidate i was selected

The value strategy enables caching (same prompt → same value), while the vote strategy provides comparative assessment at the cost of requiring all candidates in a single context window.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment