Implementation:Princeton nlp Tree of thought llm DFS Crosswords Search

Knowledge Sources	Princeton_nlp_Tree_of_thought_llm Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Domains	Search_Algorithms, LLM_Reasoning, Crosswords
Last Updated	2026-02-14 14:00 GMT

Overview

Concrete tool for depth-first tree search over LLM-generated crossword puzzle actions, producing JSON trajectory logs with board state and confidence tracking.

Description

The dfs() function implements the DFS variant of Tree of Thoughts for the mini crosswords task. Defined in the Jupyter notebook search_crosswords-dfs.ipynb, it recursively explores candidate word placements ranked by LLM-assigned confidence scores. At each board state, it calls get_candidates_to_scores() to obtain GPT-4 proposals with parsed confidence levels (certain, high, medium, low), then tries each candidate in descending score order up to a per-state limit. The function records every step as a structured info dict and appends it to an infos list that is serialized to logs/crosswords/infoss_dfs_no_prune.json (without pruning) or logs/crosswords/infoss_dfs_prune.json (with pruning).

Usage

Use this implementation when running DFS-based Tree of Thoughts experiments on mini crossword puzzles. The notebook executes DFS for puzzle indices 0 to 100 (step 5) with a time limit of 100 steps and max 3 candidates per state. The resulting JSON log is the primary experimental artifact for the crosswords DFS results reported in the paper.

Code Reference

Source Location

Repository: Princeton_nlp_Tree_of_thought_llm
File: scripts/crosswords/search_crosswords-dfs.ipynb
Output artifact: logs/crosswords/infoss_dfs_no_prune.json (44,484 lines)

Signature

def dfs(env, actions, infos, time_limit, prune, max_per_state):
    """
    Recursive depth-first search over crossword puzzle actions.

    Args:
        env: MiniCrosswordsEnv instance with current board state.
        actions: List[str] accumulator of actions taken on current path.
        infos: List[dict] accumulator of step info dicts (mutated in place).
        time_limit: int maximum total steps across all branches.
        prune: bool whether to prune branches with impossible constraints.
        max_per_state: int maximum candidates to try per board state.
    """

def get_candidates_to_scores(env):
    """
    Query GPT-4 for candidate word placements and aggregate confidence scores.

    Args:
        env: MiniCrosswordsEnv instance.

    Returns:
        dict mapping action strings (e.g., 'h2. motor') to float scores.
    """

Import

# Notebook-only: not importable as a module.
# Functions are defined inline in scripts/crosswords/search_crosswords-dfs.ipynb
from tot.tasks.crosswords import MiniCrosswordsEnv
from tot.prompts.crosswords import propose_prompt, value_prompt
from tot.models import gpt

I/O Contract

Inputs

Name	Type	Required	Description
env	MiniCrosswordsEnv	Yes	Crossword environment with board state, step, and cache
actions	List[str]	Yes	Accumulator list for current DFS path actions
infos	List[dict]	Yes	Accumulator list for all step info records (mutated)
time_limit	int	Yes	Maximum total steps across all DFS branches (default: 100)
prune	bool	Yes	If True, prune branches where any constraint is impossible
max_per_state	int	Yes	Maximum candidates to expand per board state (default: 3)

Outputs

Name	Type	Description
infos (mutated)	List[dict]	Each dict contains: total_step, env_step, actions, info (r_letter, r_word), count (sure, maybe, impossible)
JSON log file	File	Written to logs/crosswords/infoss_dfs_no_prune.json or infoss_dfs_prune.json

Output JSON Schema

[
    [
        {
            "total_step": 0,
            "env_step": 1,
            "actions": ["h2. motor"],
            "info": {"r_letter": 0.2, "r_word": 0.1},
            "count": {"sure": 1, "maybe": 0, "impossible": 0}
        }
    ]
]

total_step: Global step counter across all branches for this puzzle
env_step: Number of words placed on the current board
actions: Ordered list of placements (e.g., "h2. motor" = horizontal slot 2, word "motor")
info.r_letter: Fraction of correctly placed letters
info.r_word: Fraction of correctly completed words
count.sure/maybe/impossible: Constraint satisfaction counts from env.prompt_status()

Usage Examples

Running DFS Without Pruning

import json
from tot.tasks.crosswords import MiniCrosswordsEnv

env = MiniCrosswordsEnv()

# Run DFS on puzzles 0, 5, 10, ..., 95
infoss = []
for i in range(0, 100, 5):
    env.reset(i)
    infos = []
    actions = []
    dfs(env, actions, infos, time_limit=100, prune=False, max_per_state=3)
    infoss.append(infos)

# Save results
with open('logs/crosswords/infoss_dfs_no_prune.json', 'w') as fout:
    json.dump(infoss, fout)

Running DFS With Pruning

# With pruning: branches with impossible constraints are cut
infoss = []
for i in range(0, 100, 5):
    env.reset(i)
    infos = []
    actions = []
    dfs(env, actions, infos, time_limit=100, prune=True, max_per_state=3)
    infoss.append(infos)

with open('logs/crosswords/infoss_dfs_prune.json', 'w') as fout:
    json.dump(infoss, fout)

Analyzing DFS Results

import json

with open('logs/crosswords/infoss_dfs_no_prune.json') as f:
    infoss = json.load(f)

# Each element is a list of DFS step records for one puzzle
for puzzle_idx, infos in enumerate(infoss):
    if infos:
        best = max(infos, key=lambda x: x['info']['r_word'])
        print(f"Puzzle {puzzle_idx * 5}: best r_word={best['info']['r_word']}, "
              f"steps={len(infos)}, words placed={best['env_step']}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment