Heuristic:Spcl Graph of thoughts Budget Gated Benchmark Execution

Knowledge Sources	Graph of Thoughts
Domains	LLM_Reasoning, Optimization
Last Updated	2026-02-14 03:30 GMT

Overview

Cost control pattern that checks remaining API budget before each sample and method execution, stopping early when the budget is depleted.

Description

The benchmark execution pattern (`run` function) implements a budget gate: before processing each data sample and each method within that sample, it checks whether the remaining dollar budget is positive. If the budget is depleted, execution stops with an error log, preventing runaway API costs. The budget is tracked by accumulating the `lm.cost` property from each ChatGPT instance, which computes cost from prompt and completion token counts multiplied by configured per-thousand-token prices.

Usage

Use this heuristic when running multi-sample benchmark experiments with paid LLM APIs. It is essential for:

Running 100+ samples across multiple methods (IO, CoT, ToT, GoT)
Experiments where total cost is uncertain ahead of time
Preventing accidental overspending during development and testing

The Insight (Rule of Thumb)

Action: Set a dollar `budget` limit and check it before each execution unit. Deduct actual cost after each run.
Value: Default budget is $30 for the sorting benchmark (100 samples x 5 methods).
Trade-off: Some samples/methods may not be executed if the budget runs out. Results will be incomplete but costs are controlled.
Pattern: Instantiate a fresh LM per method-sample pair to get accurate per-run cost tracking.

Reasoning

LLM API costs scale linearly with the number of tokens processed. GoT methods use significantly more tokens than IO/CoT approaches (multiple generations, scores, aggregations per sample). Without budget gating:

A GoT benchmark on 100 samples could cost $50-100+
A bug in prompt design could cause a tight loop generating unlimited tokens
Debugging runs during development would accumulate unexpected costs

The budget gate ensures experiments are self-limiting and costs are predictable.

Code Evidence

Budget check before each sample from `examples/sorting/sorting_032.py:667-670`:

if budget <= 0.0:
    logging.error(
        f"Budget has been depleted, stopping. Data {data[0]} has not been run."
    )
    break

Budget check before each method from `examples/sorting/sorting_032.py:675-679`:

if budget <= 0.0:
    logging.error(
        f"Budget has been depleted, stopping. Method {method.__name__} has not been run."
    )
    break

Cost deduction after each run from `examples/sorting/sorting_032.py:711`:

budget -= lm.cost

Cost tracking in ChatGPT from `graph_of_thoughts/language_models/chatgpt.py:126-133`:

self.prompt_tokens += response.usage.prompt_tokens
self.completion_tokens += response.usage.completion_tokens
prompt_tokens_k = float(self.prompt_tokens) / 1000.0
completion_tokens_k = float(self.completion_tokens) / 1000.0
self.cost = (
    self.prompt_token_cost * prompt_tokens_k
    + self.response_token_cost * completion_tokens_k
)

Default budget from `examples/sorting/sorting_032.py:726`:

budget = 30

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment