Heuristic:OpenBMB UltraFeedback API Retry Strategy
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Debugging, Infrastructure |
| Last Updated | 2026-02-08 06:00 GMT |
Overview
Retry-with-sleep pattern for OpenAI API calls, using 10-20 retry attempts to handle transient rate limiting and network failures.
Description
All OpenAI API callers in the UltraFeedback codebase implement a retry loop that catches generic exceptions, prints the error message, optionally sleeps for 1 second, and retries the API call. The retry count varies by file: main.py uses 20 retries for completion generation, while the annotation files use `MAX_API_RETRY=10`. The pattern handles rate limits, timeout errors, and transient network failures without distinguishing between error types. If all retries are exhausted, some callers raise an explicit exception while others may return the last attempted value.
Usage
Use this heuristic when building long-running LLM API pipelines that process thousands of examples. Transient API failures are inevitable at scale (64k+ prompts), and a simple retry loop with backoff is the minimum viable resilience pattern. The current implementation uses a flat 1-second sleep; exponential backoff would be more robust for rate limit errors.
The Insight (Rule of Thumb)
- Action: Wrap all API calls in a retry loop with configurable maximum attempts.
- Value: `MAX_API_RETRY = 10` for annotation; 20 retries for completion generation. Sleep 1 second between retries.
- Trade-off: Simple retry catches all transient errors but does not distinguish between rate limits (should backoff) and authentication errors (should fail fast). The generic `except Exception` also catches `KeyboardInterrupt` in some files but not others.
Reasoning
Processing 64k prompts with 4 completions each through GPT-4 involves hundreds of thousands of API calls. At this scale, OpenAI rate limits and transient network errors are certain to occur. The retry pattern ensures the pipeline can recover from individual failures without losing progress on the entire dataset. The 10-20 retry count is empirically chosen to handle burst rate limits while not hanging indefinitely on persistent errors.
Code Evidence
20-retry loop from `main.py:100-118` (API_Caller):
def __call__(self, system_prompt, user_prompt):
for _ in range(20):
try:
response = openai.ChatCompletion.create(**{
"model": "gpt-4",
# ...
})
content = response["choices"][0]["message"]["content"]
except Exception as e:
print(e)
time.sleep(1)
else:
break
return content
MAX_API_RETRY=10 loop from `annotate_preference.py:13,51-71`:
MAX_API_RETRY=10
# ...
def get_eval(sys_prompt, user_prompt: str, max_tokens: int = 500):
for _ in range(MAX_API_RETRY):
try:
response = openai.ChatCompletion.create(**{...})
content = response["choices"][0]["message"]["content"]
except Exception as e:
print(e)
time.sleep(1)
else:
break
return content
10-retry with explicit exception from `annotate_critique.py:46-67`:
def get_eval(model, sys_prompt, user_prompt):
try_num = 0
while try_num < 10:
try:
response = openai.ChatCompletion.create(**{...})
return response["choices"][0]["message"]["content"].strip()
except KeyboardInterrupt as e:
raise e
except Exception as e:
print(e)
pass
raise Exception("API Error")