Heuristic:Marker Inc Korea AutoRAG OpenAI Rate Limit Mitigation

Knowledge Sources	AutoRAG AutoRAG Troubleshooting
Domains	Optimization, RAG, LLMs
Last Updated	2026-02-12 00:00 GMT

Overview

Practical mitigations for OpenAI API rate limiting in AutoRAG, including batch size reduction, separate client instances for concurrent evaluation, and reasoning model handling.

Description

AutoRAG's production experience has revealed several OpenAI API pitfalls. Rate limit errors occur when batch sizes exceed the user's tier limits (documented at batch=4 for standard tiers). The faithfulness evaluation metric requires creating separate OpenAI client instances for each concurrent API call to avoid httpx event loop errors. Reasoning models (o1, o3, o4, gpt-5) require a different API calling pattern than standard chat models. Parameters like `logprobs` and `n` are silently overridden to fixed values.

Usage

Apply this heuristic when using OpenAI models in AutoRAG pipelines, particularly during optimization trials that make many concurrent API calls. Reduce batch sizes proactively and be aware of the reasoning model differences.

The Insight (Rule of Thumb)

Action 1: Set `batch` to 3 or lower for OpenAI LLM generation.
Action 2: When evaluating faithfulness metrics, the system automatically creates separate OpenAI LLM instances per call to avoid httpx connection errors.
Action 3: Reasoning models (o1/o3/o4/gpt-5) use a different API path (`get_result_reasoning`) than standard models (`get_result`).
Value: `batch <= 3` for OpenAI; `logprobs` is always True; `n` is always 1.
Trade-off: Very small batch sizes significantly increase total pipeline execution time for large datasets. Consider upgrading your OpenAI API tier for higher rate limits.

Reasoning

The AutoRAG team discovered through production testing that OpenAI rate limits are more restrictive than expected for standard tier accounts. With batch=4, rate limit errors were consistently observed. The httpx event loop error in faithfulness evaluation is a known issue with the OpenAI Python client when making many concurrent requests from the same client instance; creating separate instances avoids the shared connection pool that triggers the error.

Code Evidence

Reasoning model detection from `autorag/nodes/generator/openai_llm.py:146-160`:

if (
    self.llm.startswith("o1")
    or self.llm.startswith("o3")
    or self.llm.startswith("o4")
    or self.llm.startswith("gpt-5")
):
    tasks = [
        self.get_result_reasoning(prompt, **openai_chat_params)
        for prompt in prompts
    ]

Forced parameter overrides from `autorag/nodes/generator/openai_llm.py:127-132`:

if kwargs.get("logprobs") is not None:
    kwargs.pop("logprobs")
    logger.warning(
        "parameter logprob does not effective. It always set to True."
    )
if kwargs.get("n") is not None:
    kwargs.pop("n")
    logger.warning("parameter n does not effective. It always set to 1.")

httpx workaround in faithfulness metric from `autorag/evaluation/metric/generation.py:145-147`:

if isinstance(generator, OpenAILLM):  # Because of the event loop error at the httpx
    # TODO: Fix the httpx APIConnectionError at the many repetitive request
    truth_responses: List[Truth] = generator.structured_output(truth_prompts, Truth)

Troubleshooting guidance from `docs/source/troubleshooting.md:137-141`:

We recommend setting batch under 3 when you are using openai model.
In our experiment, it occurred rate limit error when the batch size was 4.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment