Implementation:Princeton nlp Tree of thought llm DFS Crosswords Search
| Knowledge Sources | |
|---|---|
| Domains | Search_Algorithms, LLM_Reasoning, Crosswords |
| Last Updated | 2026-02-14 14:00 GMT |
Overview
Concrete tool for depth-first tree search over LLM-generated crossword puzzle actions, producing JSON trajectory logs with board state and confidence tracking.
Description
The dfs() function implements the DFS variant of Tree of Thoughts for the mini crosswords task. Defined in the Jupyter notebook search_crosswords-dfs.ipynb, it recursively explores candidate word placements ranked by LLM-assigned confidence scores. At each board state, it calls get_candidates_to_scores() to obtain GPT-4 proposals with parsed confidence levels (certain, high, medium, low), then tries each candidate in descending score order up to a per-state limit. The function records every step as a structured info dict and appends it to an infos list that is serialized to logs/crosswords/infoss_dfs_no_prune.json (without pruning) or logs/crosswords/infoss_dfs_prune.json (with pruning).
Usage
Use this implementation when running DFS-based Tree of Thoughts experiments on mini crossword puzzles. The notebook executes DFS for puzzle indices 0 to 100 (step 5) with a time limit of 100 steps and max 3 candidates per state. The resulting JSON log is the primary experimental artifact for the crosswords DFS results reported in the paper.
Code Reference
Source Location
- Repository: Princeton_nlp_Tree_of_thought_llm
- File: scripts/crosswords/search_crosswords-dfs.ipynb
- Output artifact: logs/crosswords/infoss_dfs_no_prune.json (44,484 lines)
Signature
def dfs(env, actions, infos, time_limit, prune, max_per_state):
"""
Recursive depth-first search over crossword puzzle actions.
Args:
env: MiniCrosswordsEnv instance with current board state.
actions: List[str] accumulator of actions taken on current path.
infos: List[dict] accumulator of step info dicts (mutated in place).
time_limit: int maximum total steps across all branches.
prune: bool whether to prune branches with impossible constraints.
max_per_state: int maximum candidates to try per board state.
"""
def get_candidates_to_scores(env):
"""
Query GPT-4 for candidate word placements and aggregate confidence scores.
Args:
env: MiniCrosswordsEnv instance.
Returns:
dict mapping action strings (e.g., 'h2. motor') to float scores.
"""
Import
# Notebook-only: not importable as a module.
# Functions are defined inline in scripts/crosswords/search_crosswords-dfs.ipynb
from tot.tasks.crosswords import MiniCrosswordsEnv
from tot.prompts.crosswords import propose_prompt, value_prompt
from tot.models import gpt
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| env | MiniCrosswordsEnv | Yes | Crossword environment with board state, step, and cache |
| actions | List[str] | Yes | Accumulator list for current DFS path actions |
| infos | List[dict] | Yes | Accumulator list for all step info records (mutated) |
| time_limit | int | Yes | Maximum total steps across all DFS branches (default: 100) |
| prune | bool | Yes | If True, prune branches where any constraint is impossible |
| max_per_state | int | Yes | Maximum candidates to expand per board state (default: 3) |
Outputs
| Name | Type | Description |
|---|---|---|
| infos (mutated) | List[dict] | Each dict contains: total_step, env_step, actions, info (r_letter, r_word), count (sure, maybe, impossible) |
| JSON log file | File | Written to logs/crosswords/infoss_dfs_no_prune.json or infoss_dfs_prune.json |
Output JSON Schema
[
[
{
"total_step": 0,
"env_step": 1,
"actions": ["h2. motor"],
"info": {"r_letter": 0.2, "r_word": 0.1},
"count": {"sure": 1, "maybe": 0, "impossible": 0}
}
]
]
- total_step: Global step counter across all branches for this puzzle
- env_step: Number of words placed on the current board
- actions: Ordered list of placements (e.g., "h2. motor" = horizontal slot 2, word "motor")
- info.r_letter: Fraction of correctly placed letters
- info.r_word: Fraction of correctly completed words
- count.sure/maybe/impossible: Constraint satisfaction counts from env.prompt_status()
Usage Examples
Running DFS Without Pruning
import json
from tot.tasks.crosswords import MiniCrosswordsEnv
env = MiniCrosswordsEnv()
# Run DFS on puzzles 0, 5, 10, ..., 95
infoss = []
for i in range(0, 100, 5):
env.reset(i)
infos = []
actions = []
dfs(env, actions, infos, time_limit=100, prune=False, max_per_state=3)
infoss.append(infos)
# Save results
with open('logs/crosswords/infoss_dfs_no_prune.json', 'w') as fout:
json.dump(infoss, fout)
Running DFS With Pruning
# With pruning: branches with impossible constraints are cut
infoss = []
for i in range(0, 100, 5):
env.reset(i)
infos = []
actions = []
dfs(env, actions, infos, time_limit=100, prune=True, max_per_state=3)
infoss.append(infos)
with open('logs/crosswords/infoss_dfs_prune.json', 'w') as fout:
json.dump(infoss, fout)
Analyzing DFS Results
import json
with open('logs/crosswords/infoss_dfs_no_prune.json') as f:
infoss = json.load(f)
# Each element is a list of DFS step records for one puzzle
for puzzle_idx, infos in enumerate(infoss):
if infos:
best = max(infos, key=lambda x: x['info']['r_word'])
print(f"Puzzle {puzzle_idx * 5}: best r_word={best['info']['r_word']}, "
f"steps={len(infos)}, words placed={best['env_step']}")