Heuristic:Princeton nlp Tree of thought llm API Request Batching

Knowledge Sources	tree-of-thought-llm OpenAI API Reference
Domains	Infrastructure, Optimization
Last Updated	2026-02-14 04:00 GMT

Overview

Splits large OpenAI API requests into batches of 20 completions to stay within per-request limits while fulfilling arbitrarily large sample counts.

Description

The OpenAI ChatCompletion API has a practical limit on the n parameter (number of completions per request). The chatgpt() function in the framework handles this transparently by splitting any request with n > 20 into sequential batches of at most 20. The results are accumulated and returned as a single flat list, making the batching invisible to callers.

Usage

This heuristic is automatically applied whenever gpt() or chatgpt() is called with n > 20. This occurs in experiments that require many samples per step, such as baseline IO/CoT sampling with --n_generate_sample 100 or evaluation with high --n_evaluate_sample.

The Insight (Rule of Thumb)

Action: In the LLM call loop, use `cnt = min(n, 20)` to cap each individual API request at 20 completions.
Value: 20 is a safe batch size that avoids API errors and timeouts for most OpenAI models.
Trade-off: Sequential batching increases total wall-clock time compared to a single large request (if it were allowed). For n=100, this means 5 sequential API calls instead of 1.

Reasoning

The OpenAI API can return errors or timeouts for very large n values in a single request. The batch size of 20 was chosen as a practical ceiling that balances throughput against reliability. Combined with the backoff retry decorator, this ensures that even large sampling experiments complete without manual intervention. The batching is transparent — callers simply pass the desired n and receive a flat list.

Code Evidence

Batching logic from `src/tot/models.py:26-37`:

def chatgpt(messages, model="gpt-4", temperature=0.7, max_tokens=1000, n=1, stop=None) -> list:
    global completion_tokens, prompt_tokens
    outputs = []
    while n > 0:
        cnt = min(n, 20)
        n -= cnt
        res = completions_with_backoff(model=model, messages=messages, temperature=temperature, max_tokens=max_tokens, n=cnt, stop=stop)
        outputs.extend([choice.message.content for choice in res.choices])
        # log completion tokens
        completion_tokens += res.usage.completion_tokens
        prompt_tokens += res.usage.prompt_tokens
    return outputs

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment