Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Princeton nlp Tree of thought llm DFS Tree Search

From Leeroopedia
Revision as of 18:10, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Princeton_nlp_Tree_of_thought_llm_DFS_Tree_Search.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Search_Algorithms, LLM_Reasoning, NLP
Last Updated 2026-02-14 14:00 GMT

Overview

A tree search algorithm that explores LLM-generated reasoning steps using depth-first recursion with backtracking, evaluating candidate actions by aggregated confidence scores and optionally pruning impossible branches.

Description

DFS Tree Search in the Tree of Thoughts framework is the depth-first counterpart to BFS Tree Search. While BFS maintains a frontier of candidates at each depth level and selects the top-k to expand uniformly, DFS fully explores one branch of the reasoning tree before backtracking to try alternatives. This makes DFS particularly suited for constraint satisfaction problems like crossword puzzles, where early detection of dead ends (via pruning) can save significant computation.

At each board state, the algorithm:

  1. Generates candidates: Queries the LLM (GPT-4) multiple times and parses proposals with confidence levels (certain, high, medium, low) into numeric scores.
  2. Ranks candidates: Sorts by aggregated confidence score in descending order.
  3. Recurses: For each candidate (up to a per-state limit), applies the action to the environment, checks constraint satisfaction, and recurses deeper.
  4. Backtracks: Restores the environment state after exploring a branch and tries the next candidate.

The optional pruning mechanism cuts branches where any constraint becomes impossible (status code 2), reducing the search space at the cost of potentially missing solutions reachable through seemingly impossible intermediate states.

Usage

Use this principle when the problem involves sequential constraint satisfaction where partial solutions can be validated incrementally. DFS is preferred over BFS when the branching factor is manageable and early pruning of invalid paths provides significant speedup. In the Tree of Thoughts paper, DFS is applied specifically to the mini crosswords task where 5-letter words must satisfy overlapping letter constraints across horizontal and vertical slots.

Theoretical Basis

DFS explores the search tree by depth-first recursion. Let s be a board state, C(s) the set of candidate actions with scores, and T the global step budget:

# Abstract DFS algorithm for Tree of Thoughts
def dfs(state, path, history, budget, prune, max_per_state):
    candidates = rank_by_score(generate_candidates(state))
    count = 0
    for action in candidates:
        if len(history) >= budget:
            return
        next_state = apply(state, action)
        if violates_constraints(next_state):
            continue
        count += 1
        if count > max_per_state:
            break
        path.append(action)
        history.append(record(next_state, path))
        if not prune or not has_impossible(next_state):
            dfs(next_state, path, history, budget, prune, max_per_state)
        path.pop()
        restore(state)

Key differences from BFS Tree Search:

  • Exploration order: DFS fully explores one branch before trying alternatives; BFS expands all candidates at each depth uniformly.
  • Memory: DFS uses O(d) memory for depth d; BFS uses O(kd) for beam width k.
  • Pruning: DFS naturally supports branch pruning via constraint checking at each node; BFS prunes at the selection phase.
  • Completeness: DFS with a time limit is incomplete (may miss solutions in unexplored branches); BFS with sufficient beam width explores more uniformly.

Confidence scoring maps LLM-parsed confidence labels to numeric values:

Confidence Label Numeric Score
certain 1.0
high 0.5
medium 0.2
low 0.1

Scores are aggregated across multiple LLM responses (n=8 queries) to rank candidates robustly.

Related Pages

Implemented By

Heuristics

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment