Implementation:BerriAI Litellm Least Busy Strategy

Attribute	Value
Sources	litellm/router_strategy/least_busy.py
Domains	Router, Strategy, Load Balancing
last_updated	2026-02-15 16:00 GMT

Overview

The Least Busy Strategy is a router deployment selection strategy that routes requests to the deployment with the fewest in-flight (active) requests.

Description

This module provides the LeastBusyLoggingHandler class, which extends CustomLogger to track the number of active requests per deployment. Before each API call, it increments a counter for the target deployment. On success or failure, it decrements the counter. When selecting a deployment, it picks the one with the lowest active request count from the cache. This approach ensures traffic is distributed to deployments that currently have the least load, making it suitable for scenarios where response latency varies significantly between deployments.

Usage

Import and use this class when configuring the LiteLLM Router with routing_strategy="least-busy". The router registers it as a callback handler for tracking in-flight requests.

Code Reference

Source Location

litellm/router_strategy/least_busy.py

Class: LeastBusyLoggingHandler

class LeastBusyLoggingHandler(CustomLogger):
    test_flag: bool = False
    logged_success: int = 0
    logged_failure: int = 0

    def __init__(self, router_cache: DualCache):

Key Methods

Method	Signature	Description
`log_pre_api_call`	`def log_pre_api_call(self, model, messages, kwargs)`	Increments request count before API call
`log_success_event`	`def log_success_event(self, kwargs, response_obj, start_time, end_time)`	Decrements request count on success (sync)
`log_failure_event`	`def log_failure_event(self, kwargs, response_obj, start_time, end_time)`	Decrements request count on failure (sync)
`async_log_success_event`	`async def async_log_success_event(self, kwargs, response_obj, start_time, end_time)`	Decrements request count on success (async)
`async_log_failure_event`	`async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time)`	Decrements request count on failure (async)
`get_available_deployments`	`def get_available_deployments(self, model_group: str, healthy_deployments: list)`	Sync: returns the deployment with the fewest in-flight requests
`async_get_available_deployments`	`async def async_get_available_deployments(self, model_group: str, healthy_deployments: list)`	Async: returns the deployment with the fewest in-flight requests

Import

from litellm.router_strategy.least_busy import LeastBusyLoggingHandler

I/O Contract

Inputs

Parameter	Type	Description
`router_cache`	`DualCache`	Shared cache instance for storing active request counts
`model_group`	`str`	The model group name for deployment selection
`healthy_deployments`	`list`	List of deployment dictionaries considered healthy

Outputs

Return Type	Description
`dict`	The selected deployment dictionary with the least in-flight requests. Falls back to random selection if no minimum is found.

Usage Examples

from litellm.caching.caching import DualCache
from litellm.router_strategy.least_busy import LeastBusyLoggingHandler

cache = DualCache()
handler = LeastBusyLoggingHandler(router_cache=cache)

# Sync deployment selection
deployment = handler.get_available_deployments(
    model_group="gpt-4",
    healthy_deployments=[
        {"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
        {"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
    ],
)

# Async deployment selection
deployment = await handler.async_get_available_deployments(
    model_group="gpt-4",
    healthy_deployments=[
        {"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
        {"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
    ],
)

Related Pages

BerriAI_Litellm_Lowest_TPM_RPM_V2_Strategy - TPM/RPM-based deployment selection strategy
BerriAI_Litellm_Lowest_Cost_Strategy - Cost-based deployment selection strategy
BerriAI_Litellm_Simple_Shuffle_Strategy - Random/weighted shuffle deployment selection
BerriAI_Litellm_Lowest_TPM_RPM_Strategy - Original TPM/RPM routing strategy

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment