Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:BerriAI Litellm Least Busy Strategy

From Leeroopedia
Attribute Value
Sources litellm/router_strategy/least_busy.py
Domains Router, Strategy, Load Balancing
last_updated 2026-02-15 16:00 GMT

Overview

The Least Busy Strategy is a router deployment selection strategy that routes requests to the deployment with the fewest in-flight (active) requests.

Description

This module provides the LeastBusyLoggingHandler class, which extends CustomLogger to track the number of active requests per deployment. Before each API call, it increments a counter for the target deployment. On success or failure, it decrements the counter. When selecting a deployment, it picks the one with the lowest active request count from the cache. This approach ensures traffic is distributed to deployments that currently have the least load, making it suitable for scenarios where response latency varies significantly between deployments.

Usage

Import and use this class when configuring the LiteLLM Router with routing_strategy="least-busy". The router registers it as a callback handler for tracking in-flight requests.

Code Reference

Source Location

litellm/router_strategy/least_busy.py

Class: LeastBusyLoggingHandler

class LeastBusyLoggingHandler(CustomLogger):
    test_flag: bool = False
    logged_success: int = 0
    logged_failure: int = 0

    def __init__(self, router_cache: DualCache):

Key Methods

Method Signature Description
log_pre_api_call def log_pre_api_call(self, model, messages, kwargs) Increments request count before API call
log_success_event def log_success_event(self, kwargs, response_obj, start_time, end_time) Decrements request count on success (sync)
log_failure_event def log_failure_event(self, kwargs, response_obj, start_time, end_time) Decrements request count on failure (sync)
async_log_success_event async def async_log_success_event(self, kwargs, response_obj, start_time, end_time) Decrements request count on success (async)
async_log_failure_event async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time) Decrements request count on failure (async)
get_available_deployments def get_available_deployments(self, model_group: str, healthy_deployments: list) Sync: returns the deployment with the fewest in-flight requests
async_get_available_deployments async def async_get_available_deployments(self, model_group: str, healthy_deployments: list) Async: returns the deployment with the fewest in-flight requests

Import

from litellm.router_strategy.least_busy import LeastBusyLoggingHandler

I/O Contract

Inputs

Parameter Type Description
router_cache DualCache Shared cache instance for storing active request counts
model_group str The model group name for deployment selection
healthy_deployments list List of deployment dictionaries considered healthy

Outputs

Return Type Description
dict The selected deployment dictionary with the least in-flight requests. Falls back to random selection if no minimum is found.

Usage Examples

from litellm.caching.caching import DualCache
from litellm.router_strategy.least_busy import LeastBusyLoggingHandler

cache = DualCache()
handler = LeastBusyLoggingHandler(router_cache=cache)

# Sync deployment selection
deployment = handler.get_available_deployments(
    model_group="gpt-4",
    healthy_deployments=[
        {"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
        {"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
    ],
)

# Async deployment selection
deployment = await handler.async_get_available_deployments(
    model_group="gpt-4",
    healthy_deployments=[
        {"model_info": {"id": "deploy-1"}, "litellm_params": {"model": "gpt-4"}},
        {"model_info": {"id": "deploy-2"}, "litellm_params": {"model": "gpt-4"}},
    ],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment