Heuristic:Microsoft BIPIA OpenAI Rate Limit Retry

Knowledge Sources	Microsoft BIPIA OpenAI Rate Limits
Domains	Optimization, Debugging
Last Updated	2026-02-14 15:00 GMT

Overview

Robust retry strategy for OpenAI API calls that parses the retry delay from rate limit error messages and handles multiple API error types gracefully.

Description

The BIPIA codebase implements a comprehensive error handling pattern for OpenAI API calls. For rate limit errors, the code parses the "after N seconds" message from the error response to determine the exact wait time before retrying. For transient errors (timeout, connection errors, API errors, service unavailable), it retries after a fixed 1-second delay. For permanent errors (invalid request, unknown exceptions), it logs a warning and returns empty results rather than crashing. This pattern is duplicated across the GPT model wrapper (`bipia/model/gpt.py`) and the model evaluator (`bipia/metrics/eval/model.py`).

Usage

This heuristic applies whenever running GPT-based inference or model-based ASR evaluation. Understanding this pattern is important because: (1) the retry loop runs indefinitely for transient errors, meaning the process will never exit on its own if the API is permanently down; (2) rate limit errors are handled optimally by waiting the exact server-specified duration; (3) invalid request errors (e.g., prompt too long) silently return empty results, which may appear as evaluation failures.

The Insight (Rule of Thumb)

Action: Parse the retry delay from `RateLimitError` messages using regex `r"after (\d+) seconds"`. Fall back to 1 second if parsing fails.
Value: Default retry delay is 1 second. Rate limit delay is extracted dynamically from the error message.
Trade-off: The infinite retry loop ensures no data is lost from transient errors, but the process can hang indefinitely if the API is permanently unreachable. `InvalidRequestError` silently returns empty results, which affects ASR metrics (evaluated as -1).
Deduplication note: This pattern is duplicated in `bipia/model/gpt.py` and `bipia/metrics/eval/model.py`.

Reasoning

OpenAI API rate limits vary by tier and model. Rather than using a fixed exponential backoff that may be too aggressive or too conservative, the code extracts the exact wait time from the error response. This minimizes unnecessary waiting while respecting the server's rate limit window. The infinite retry approach is appropriate for batch evaluation workloads where losing a single sample's response would compromise the entire benchmark.

Code Evidence

Rate limit retry time parsing from `bipia/model/gpt.py:28-32`:

def get_retry_time(err_info):
    z = re.search(r"after (\d+) seconds", err_info)
    if z:
        return int(z.group(1))
    return 1

Error handling loop from `bipia/model/gpt.py:48-88`:

success = False
while not success:
    try:
        response = openai.ChatCompletion.create(...)
        success = True
    except RateLimitError as e:
        retry_time = get_retry_time(str(e))
        time.sleep(retry_time)
    except Timeout as e:
        time.sleep(1)
    except InvalidRequestError as e:
        logger.warning(e, exc_info=True)
        success = True
        response = {"choices": []}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment