Heuristic:Arize ai Phoenix AIMD Concurrency Control
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Evaluation |
| Last Updated | 2026-02-14 06:00 GMT |
Overview
AIMD (Additive Increase/Multiplicative Decrease) concurrency controller that dynamically adjusts parallel worker count during batch evaluations based on error feedback.
Description
The `ConcurrencyController` in the executor infrastructure uses the AIMD algorithm (similar to TCP congestion control) to dynamically adjust how many concurrent workers process LLM evaluation tasks. When no errors occur, the number of active workers slowly increases (additive increase by +0.5 per window). When errors or timeouts occur, the concurrency target is halved (multiplicative decrease by 0.5). A collapse mechanism rapidly drops concurrency to 1 if multiple errors occur within a short window (default: 2 errors in 15 seconds).
Workers above the current target concurrency are gated: they do not dequeue tasks but instead sleep and periodically check if concurrency has increased enough for them to resume.
Usage
This heuristic is relevant when running batch LLM evaluations with the `AsyncExecutor`. Use this knowledge when:
- Deciding on the `concurrency` parameter for `llm_classify` or `run_experiment`
- Debugging why some workers appear idle during evaluation runs
- Understanding throughput fluctuations in long-running evaluation batches
The Insight (Rule of Thumb)
- Action: Set `concurrency` to the maximum number of parallel workers you want. The AIMD controller will use fewer workers when errors indicate the backend cannot handle full concurrency.
- Value: Default concurrency is 3. The controller starts all workers active and adjusts based on a 5-second feedback window.
- Trade-off: Higher concurrency gives faster throughput when the LLM provider can handle it, but the controller may oscillate if the rate limit is near the concurrency boundary.
- Key parameters:
- `increase_step` = 0.5 (add 0.5 worker per successful window)
- `decrease_ratio` = 0.5 (halve on error)
- `window_seconds` = 5.0 (feedback evaluation period)
- `collapse_error_threshold` = 2 (errors in 15s triggers collapse to 1)
Reasoning
LLM API backends have varying capacity that changes over time. Static concurrency either underutilizes the backend or triggers cascading failures. The AIMD approach provides:
- Automatic scaling: The system finds the right concurrency level without user tuning.
- Rapid failure recovery: The collapse mechanism (2 errors in 15 seconds = drop to 1 worker) prevents compounding errors when the backend is overwhelmed.
- Smooth recovery: Additive increase (not multiplicative) prevents oscillation around the optimal point.
From `executors.py:81-85`:
"""
Steady-state guide for choosing feedback constants:
concurrency ~= a * (1 - r_e) / ((1 - β) * r_e)
where r_e is the fraction of windows that observe at least one error.
To tend toward a single active worker when errors are frequent, select (a, β) so that
concurrency <= 1 when r_e >= a / (a + 1 - β).
Example: a=1, β=0.5 ⇒ threshold r_e >= 2/3.
"""