Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Open compass VLMEvalKit API Retry With Random Delay

From Leeroopedia
Knowledge Sources
Domains API_Integration, Optimization
Last Updated 2026-02-14 01:30 GMT

Overview

API resilience pattern using randomized exponential delay between retries and an initial random jitter to prevent thundering herd problems when calling VLM APIs in parallel.

Description

The `BaseAPI.generate()` method implements a retry loop with two layers of randomized delay: (1) an initial random delay of 0-0.5 seconds before the first attempt, and (2) a random delay of `0` to `2 * wait` seconds between each retry attempt. This jitter prevents multiple parallel API workers from hitting the API simultaneously after a rate limit or transient failure. The default retry count is 10 with a 1-second base wait time.

Usage

This heuristic is automatically applied by the `BaseAPI` class for all API model evaluations. Adjust the `retry` and `wait` parameters when initializing API models to tune resilience. Increase `retry` for unreliable APIs; increase `wait` for rate-limited APIs.

The Insight (Rule of Thumb)

  • Action: Add random jitter before API calls and between retries. Never use fixed delays.
  • Value: Initial jitter: `random() * 0.5` seconds. Retry delay: `random() * wait * 2` seconds. Default: `retry=10`, `wait=1`.
  • Trade-off: Slightly longer total time per failed request, but prevents cascading failures from synchronized retries across parallel workers.
  • Compatibility: Applied to all API wrappers that extend `BaseAPI`.

Reasoning

When running `api_nproc` parallel workers (default 4, configurable via `--api-nproc`), a rate limit response would cause all workers to retry simultaneously without jitter, creating a "thundering herd" that triggers further rate limiting. The randomized delay spreads retries across time, increasing the chance of successful calls. The 0.5-second initial jitter further desynchronizes workers at the start of parallel processing.

Code Evidence

From `vlmeval/api/base.py:239-268` (the `generate()` method):

answer = None
# a very small random delay [0s - 0.5s]
T = rd.random() * 0.5
time.sleep(T)

for i in range(self.retry):
    try:
        ret_code, answer, log = self.generate_inner(message, **kwargs)
        if ret_code == 0 and self.fail_msg not in answer and answer != '':
            if self.verbose:
                print(answer)
            return answer
        elif self.verbose:
            ...
    except Exception as err:
        if self.verbose:
            self.logger.error(f'An error occured during try {i}: ')
            self.logger.error(f'{type(err)}: {err}')
    # delay before each retry
    T = rd.random() * self.wait * 2
    time.sleep(T)

return self.fail_msg if answer in ['', None] else answer

Default parameters from `vlmeval/api/base.py:15-21`:

def __init__(self,
             retry=10,
             wait=1,
             system_prompt=None,
             verbose=True,
             fail_msg='Failed to obtain answer via API.',
             **kwargs):

Working check with reduced timeout from `vlmeval/api/base.py:59-81`:

def working(self):
    self.old_timeout = None
    if hasattr(self, 'timeout'):
        self.old_timeout = self.timeout
        self.timeout = 120
    retry = 5
    while retry > 0:
        ret = self.generate('hello')
        if ret is not None and ret != '' and self.fail_msg not in ret:
            ...
            return True
        retry -= 1
    ...
    return False

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment