Heuristic:Spcl Graph of thoughts Backoff Retry On API Errors
| Knowledge Sources | |
|---|---|
| Domains | LLM_Reasoning, Optimization |
| Last Updated | 2026-02-14 03:30 GMT |
Overview
Resilience strategy using exponential backoff and adaptive sample reduction to handle OpenAI API errors and rate limits.
Description
The ChatGPT language model implementation uses a two-layer retry strategy for handling API failures. At the low level, the `chat` method uses the `backoff` library to automatically retry on any `OpenAIError` with exponential backoff (max 10 seconds, max 6 tries). At the higher level, the `query` method implements an adaptive sample reduction strategy: when requesting multiple responses (`num_responses > 1`) and an error occurs, it halves the requested sample count and retries, with a random sleep of 1-3 seconds between attempts. This dual-layer approach handles both transient API errors and rate-limit-induced failures.
Usage
This heuristic is automatically applied whenever the `ChatGPT` language model is used. It is particularly relevant when running benchmark experiments that make many API calls in sequence (e.g., 100 samples across 5 methods), as API rate limits are more likely to be hit during sustained usage.
The Insight (Rule of Thumb)
- Action 1: Apply `@backoff.on_exception(backoff.expo, OpenAIError, max_time=10, max_tries=6)` on the low-level API call method.
- Value: Exponential backoff with 10-second max wait and 6 maximum retries per call.
- Action 2: When requesting multiple responses, halve `num_responses` on each failure: `next_try = (next_try + 1) // 2`.
- Value: Binary reduction of request size, with random 1-3 second sleep between retries.
- Trade-off: Increased latency during errors, but prevents complete pipeline failure from transient API issues.
Reasoning
OpenAI API calls can fail due to rate limits (HTTP 429), server overloads (HTTP 500/503), or temporary network issues. Without retry logic, a single API failure would crash an entire benchmark run that may have already consumed significant budget. The exponential backoff handles transient errors, while the adaptive sample reduction addresses the common pattern where requesting many completions simultaneously (high `n` parameter) triggers rate limits that would not occur with smaller batch sizes.
Code Evidence
Exponential backoff decorator from `graph_of_thoughts/language_models/chatgpt.py:104`:
@backoff.on_exception(backoff.expo, OpenAIError, max_time=10, max_tries=6)
def chat(self, messages: List[Dict], num_responses: int = 1) -> ChatCompletion:
Adaptive sample reduction from `graph_of_thoughts/language_models/chatgpt.py:85-98`:
while num_responses > 0 and total_num_attempts > 0:
try:
assert next_try > 0
res = self.chat([{"role": "user", "content": query}], next_try)
response.append(res)
num_responses -= next_try
next_try = min(num_responses, next_try)
except Exception as e:
next_try = (next_try + 1) // 2
self.logger.warning(
f"Error in chatgpt: {e}, trying again with {next_try} samples"
)
time.sleep(random.randint(1, 3))
total_num_attempts -= 1