Implementation:BerriAI Litellm Least Busy Strategy
| Attribute | Value |
|---|---|
| Sources | litellm/router_strategy/least_busy.py |
| Domains | Router, Strategy, Load Balancing |
| last_updated | 2026-02-15 16:00 GMT |
Overview
The Least Busy Strategy is a router deployment selection strategy that routes requests to the deployment with the fewest in-flight (active) requests.
Description
This module provides the LeastBusyLoggingHandler class, which extends CustomLogger to track the number of active requests per deployment. Before each API call, it increments a counter for the target deployment. On success or failure, it decrements the counter. When selecting a deployment, it picks the one with the lowest active request count from the cache. This approach ensures traffic is distributed to deployments that currently have the least load, making it suitable for scenarios where response latency varies significantly between deployments.
Usage
Import and use this class when configuring the LiteLLM Router with routing_strategy="least-busy". The router registers it as a callback handler for tracking in-flight requests.
Code Reference
Source Location
litellm/router_strategy/least_busy.py
Class: LeastBusyLoggingHandler
class LeastBusyLoggingHandler(CustomLogger):
test_flag: bool = False
logged_success: int = 0
logged_failure: int = 0
def __init__(self, router_cache: DualCache):
Key Methods
| Method | Signature | Description |
|---|---|---|
log_pre_api_call |
def log_pre_api_call(self, model, messages, kwargs) |
Increments request count before API call |
log_success_event |
def log_success_event(self, kwargs, response_obj, start_time, end_time) |
Decrements request count on success (sync) |
log_failure_event |
def log_failure_event(self, kwargs, response_obj, start_time, end_time) |
Decrements request count on failure (sync) |
async_log_success_event |
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time) |
Decrements request count on success (async) |
async_log_failure_event |
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time) |
Decrements request count on failure (async) |
get_available_deployments |
def get_available_deployments(self, model_group: str, healthy_deployments: list) |
Sync: returns the deployment with the fewest in-flight requests |
async_get_available_deployments |
async def async_get_available_deployments(self, model_group: str, healthy_deployments: list) |
Async: returns the deployment with the fewest in-flight requests |
Import
from litellm.router_strategy.least_busy import LeastBusyLoggingHandler
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
router_cache |
DualCache |
Shared cache instance for storing active request counts |
model_group |
str |
The model group name for deployment selection |
healthy_deployments |
list |
List of deployment dictionaries considered healthy |
Outputs
| Return Type | Description |
|---|---|
dict |
The selected deployment dictionary with the least in-flight requests. Falls back to random selection if no minimum is found. |
Usage Examples
from litellm.caching.caching import DualCache
from litellm.router_strategy.least_busy import LeastBusyLoggingHandler
cache = DualCache()
handler = LeastBusyLoggingHandler(router_cache=cache)
# Sync deployment selection
deployment = handler.get_available_deployments(
model_group="gpt-4",
healthy_deployments=[
{"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
{"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
],
)
# Async deployment selection
deployment = await handler.async_get_available_deployments(
model_group="gpt-4",
healthy_deployments=[
{"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
{"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
],
)
Related Pages
- BerriAI_Litellm_Lowest_TPM_RPM_V2_Strategy - TPM/RPM-based deployment selection strategy
- BerriAI_Litellm_Lowest_Cost_Strategy - Cost-based deployment selection strategy
- BerriAI_Litellm_Simple_Shuffle_Strategy - Random/weighted shuffle deployment selection
- BerriAI_Litellm_Lowest_TPM_RPM_Strategy - Original TPM/RPM routing strategy