Heuristic:Open compass VLMEvalKit API Retry With Random Delay
| Knowledge Sources | |
|---|---|
| Domains | API_Integration, Optimization |
| Last Updated | 2026-02-14 01:30 GMT |
Overview
API resilience pattern using randomized exponential delay between retries and an initial random jitter to prevent thundering herd problems when calling VLM APIs in parallel.
Description
The `BaseAPI.generate()` method implements a retry loop with two layers of randomized delay: (1) an initial random delay of 0-0.5 seconds before the first attempt, and (2) a random delay of `0` to `2 * wait` seconds between each retry attempt. This jitter prevents multiple parallel API workers from hitting the API simultaneously after a rate limit or transient failure. The default retry count is 10 with a 1-second base wait time.
Usage
This heuristic is automatically applied by the `BaseAPI` class for all API model evaluations. Adjust the `retry` and `wait` parameters when initializing API models to tune resilience. Increase `retry` for unreliable APIs; increase `wait` for rate-limited APIs.
The Insight (Rule of Thumb)
- Action: Add random jitter before API calls and between retries. Never use fixed delays.
- Value: Initial jitter: `random() * 0.5` seconds. Retry delay: `random() * wait * 2` seconds. Default: `retry=10`, `wait=1`.
- Trade-off: Slightly longer total time per failed request, but prevents cascading failures from synchronized retries across parallel workers.
- Compatibility: Applied to all API wrappers that extend `BaseAPI`.
Reasoning
When running `api_nproc` parallel workers (default 4, configurable via `--api-nproc`), a rate limit response would cause all workers to retry simultaneously without jitter, creating a "thundering herd" that triggers further rate limiting. The randomized delay spreads retries across time, increasing the chance of successful calls. The 0.5-second initial jitter further desynchronizes workers at the start of parallel processing.
Code Evidence
From `vlmeval/api/base.py:239-268` (the `generate()` method):
answer = None
# a very small random delay [0s - 0.5s]
T = rd.random() * 0.5
time.sleep(T)
for i in range(self.retry):
try:
ret_code, answer, log = self.generate_inner(message, **kwargs)
if ret_code == 0 and self.fail_msg not in answer and answer != '':
if self.verbose:
print(answer)
return answer
elif self.verbose:
...
except Exception as err:
if self.verbose:
self.logger.error(f'An error occured during try {i}: ')
self.logger.error(f'{type(err)}: {err}')
# delay before each retry
T = rd.random() * self.wait * 2
time.sleep(T)
return self.fail_msg if answer in ['', None] else answer
Default parameters from `vlmeval/api/base.py:15-21`:
def __init__(self,
retry=10,
wait=1,
system_prompt=None,
verbose=True,
fail_msg='Failed to obtain answer via API.',
**kwargs):
Working check with reduced timeout from `vlmeval/api/base.py:59-81`:
def working(self):
self.old_timeout = None
if hasattr(self, 'timeout'):
self.old_timeout = self.timeout
self.timeout = 120
retry = 5
while retry > 0:
ret = self.generate('hello')
if ret is not None and ret != '' and self.fail_msg not in ret:
...
return True
retry -= 1
...
return False