Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:BerriAI Litellm Health Check

From Leeroopedia
Knowledge Sources Domains Last Updated
BerriAI/litellm repository Health Checking, Prometheus Metrics, Observability 2026-02-15

Overview

Concrete tool for monitoring proxy health and collecting operational metrics provided by the LiteLLM perform_health_check function and PrometheusLogger class.

Description

The health checking and metrics subsystem consists of two primary components:

1. perform_health_check (in health_check.py): An async function that probes the availability of configured LLM model deployments. It filters the model list by the target model (if specified), deduplicates deployments by ID, and concurrently sends lightweight test requests to each deployment using litellm.ahealth_check. Each probe runs with a configurable timeout (defaulting to HEALTH_CHECK_TIMEOUT_SECONDS). Results are classified as healthy or unhealthy endpoints, with sensitive data (API keys, credentials, messages) stripped from the output.

2. PrometheusLogger (in prometheus.py): A custom logger integration that exposes operational metrics in Prometheus format. It creates and manages counters, histograms, and gauges for request counts, latency distributions, spend tracking, token usage, and budget/rate limit remaining values. The logger registers as a LiteLLM callback and is invoked on every request completion and failure.

Key metrics exposed by PrometheusLogger:

  • litellm_proxy_total_requests_metric -- Total number of requests to the proxy.
  • litellm_proxy_failed_requests_metric -- Total number of failed requests.
  • litellm_request_total_latency_metric -- End-to-end request latency histogram.
  • litellm_llm_api_latency_metric -- LLM API call latency histogram.
  • litellm_llm_api_time_to_first_token_metric -- Time to first token histogram.
  • litellm_spend_metric -- Cumulative spend counter.
  • litellm_total_tokens_metric -- Total input + output tokens counter.

Usage

Use perform_health_check when implementing the /health endpoint or running background health check loops. Use PrometheusLogger by adding "prometheus" to the callbacks list in the litellm settings or proxy configuration. The Prometheus metrics are then available at the /metrics endpoint for scraping.

Code Reference

Attribute Value
Source Location (health check) litellm/proxy/health_check.py, lines 188-223
Source Location (prometheus) litellm/integrations/prometheus.py, line 62
Health Check Signature async def perform_health_check(model_list: list, model: Optional[str] = None, cli_model: Optional[str] = None, details: Optional[bool] = True) -> Tuple[list, list]
PrometheusLogger Class class PrometheusLogger(CustomLogger)
Import (health check) from litellm.proxy.health_check import perform_health_check
Import (prometheus) from litellm.integrations.prometheus import PrometheusLogger

I/O Contract

Inputs (perform_health_check)

Parameter Type Description
model_list list List of model deployment dictionaries, each containing model_name, litellm_params, and model_info.
model Optional[str] If provided, only check deployments matching this model name. Matches against both litellm_params.model and model_name.
cli_model Optional[str] If model_list is empty and cli_model is set, creates a single-model list for health checking.
details Optional[bool] If True (default), returns full deployment details. If False, returns only minimal display parameters (model and mode_error).

Outputs (perform_health_check)

Return Element Type Description
healthy_endpoints list List of deployment data dictionaries for models that responded successfully within the timeout.
unhealthy_endpoints list List of deployment data dictionaries for models that failed or timed out, including error details.

PrometheusLogger Key Metrics

Metric Name Type Description
litellm_proxy_total_requests_metric Counter Total requests made to the proxy server.
litellm_proxy_failed_requests_metric Counter Total failed responses from the proxy.
litellm_request_total_latency_metric Histogram Total latency (seconds) for a request to LiteLLM, with configurable bucket boundaries.
litellm_llm_api_latency_metric Histogram Total latency (seconds) for the LLM provider API call.
litellm_llm_api_time_to_first_token_metric Histogram Time to first token for streaming LLM API calls.
litellm_spend_metric Counter Cumulative spend on LLM requests.
litellm_total_tokens_metric Counter Total number of input + output tokens.

Usage Examples

Running a health check programmatically:

from litellm.proxy.health_check import perform_health_check

# Assuming model_list is loaded from proxy config
model_list = [
    {
        "model_name": "gpt-4",
        "litellm_params": {"model": "openai/gpt-4", "api_key": "sk-..."},
        "model_info": {"id": "model-1", "mode": "chat"},
    },
    {
        "model_name": "claude-3",
        "litellm_params": {"model": "anthropic/claude-3-opus-20240229", "api_key": "sk-ant-..."},
        "model_info": {"id": "model-2", "mode": "chat"},
    },
]

healthy, unhealthy = await perform_health_check(model_list=model_list)
print(f"Healthy: {len(healthy)}, Unhealthy: {len(unhealthy)}")

Checking a specific model:

healthy, unhealthy = await perform_health_check(
    model_list=model_list,
    model="gpt-4",
    details=True
)

Enabling Prometheus metrics in proxy configuration:

# config.yaml
litellm_settings:
  callbacks: ["prometheus"]
  success_callback: ["prometheus"]
  failure_callback: ["prometheus"]

Querying the health endpoint via curl:

# Check all models
curl http://localhost:4000/health

# Check a specific model
curl "http://localhost:4000/health?model=gpt-4"

Scraping Prometheus metrics:

# Prometheus metrics endpoint
curl http://localhost:4000/metrics

Example Prometheus/Grafana query for request latency:

# P99 latency by model over 5 minutes
histogram_quantile(0.99,
  rate(litellm_request_total_latency_metric_bucket[5m])
) by (model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment