Heuristic:OpenBMB UltraFeedback API Retry Strategy

Knowledge Sources	OpenBMB UltraFeedback
Domains	Optimization, Debugging, Infrastructure
Last Updated	2026-02-08 06:00 GMT

Overview

Retry-with-sleep pattern for OpenAI API calls, using 10-20 retry attempts to handle transient rate limiting and network failures.

Description

All OpenAI API callers in the UltraFeedback codebase implement a retry loop that catches generic exceptions, prints the error message, optionally sleeps for 1 second, and retries the API call. The retry count varies by file: main.py uses 20 retries for completion generation, while the annotation files use `MAX_API_RETRY=10`. The pattern handles rate limits, timeout errors, and transient network failures without distinguishing between error types. If all retries are exhausted, some callers raise an explicit exception while others may return the last attempted value.

Usage

Use this heuristic when building long-running LLM API pipelines that process thousands of examples. Transient API failures are inevitable at scale (64k+ prompts), and a simple retry loop with backoff is the minimum viable resilience pattern. The current implementation uses a flat 1-second sleep; exponential backoff would be more robust for rate limit errors.

The Insight (Rule of Thumb)

Action: Wrap all API calls in a retry loop with configurable maximum attempts.
Value: `MAX_API_RETRY = 10` for annotation; 20 retries for completion generation. Sleep 1 second between retries.
Trade-off: Simple retry catches all transient errors but does not distinguish between rate limits (should backoff) and authentication errors (should fail fast). The generic `except Exception` also catches `KeyboardInterrupt` in some files but not others.

Reasoning

Processing 64k prompts with 4 completions each through GPT-4 involves hundreds of thousands of API calls. At this scale, OpenAI rate limits and transient network errors are certain to occur. The retry pattern ensures the pipeline can recover from individual failures without losing progress on the entire dataset. The 10-20 retry count is empirically chosen to handle burst rate limits while not hanging indefinitely on persistent errors.

Code Evidence

20-retry loop from `main.py:100-118` (API_Caller):

def __call__(self, system_prompt, user_prompt):
    for _ in range(20):
        try:
            response = openai.ChatCompletion.create(**{
                "model": "gpt-4",
                # ...
            })
            content = response["choices"][0]["message"]["content"]
        except Exception as e:
            print(e)
            time.sleep(1)
        else:
            break
    return content

MAX_API_RETRY=10 loop from `annotate_preference.py:13,51-71`:

MAX_API_RETRY=10
# ...
def get_eval(sys_prompt, user_prompt: str, max_tokens: int = 500):
    for _ in range(MAX_API_RETRY):
        try:
            response = openai.ChatCompletion.create(**{...})
            content = response["choices"][0]["message"]["content"]
        except Exception as e:
            print(e)
            time.sleep(1)
        else:
            break
    return content

10-retry with explicit exception from `annotate_critique.py:46-67`:

def get_eval(model, sys_prompt, user_prompt):
    try_num = 0
    while try_num < 10:
        try:
            response = openai.ChatCompletion.create(**{...})
            return response["choices"][0]["message"]["content"].strip()
        except KeyboardInterrupt as e:
            raise e
        except Exception as e:
            print(e)
            pass
    raise Exception("API Error")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment