Implementation:BerriAI Litellm Health Check

Knowledge Sources	Domains	Last Updated
BerriAI/litellm repository	Health Checking, Prometheus Metrics, Observability	2026-02-15

Overview

Concrete tool for monitoring proxy health and collecting operational metrics provided by the LiteLLM perform_health_check function and PrometheusLogger class.

Description

The health checking and metrics subsystem consists of two primary components:

1. perform_health_check (in health_check.py): An async function that probes the availability of configured LLM model deployments. It filters the model list by the target model (if specified), deduplicates deployments by ID, and concurrently sends lightweight test requests to each deployment using litellm.ahealth_check. Each probe runs with a configurable timeout (defaulting to HEALTH_CHECK_TIMEOUT_SECONDS). Results are classified as healthy or unhealthy endpoints, with sensitive data (API keys, credentials, messages) stripped from the output.

2. PrometheusLogger (in prometheus.py): A custom logger integration that exposes operational metrics in Prometheus format. It creates and manages counters, histograms, and gauges for request counts, latency distributions, spend tracking, token usage, and budget/rate limit remaining values. The logger registers as a LiteLLM callback and is invoked on every request completion and failure.

Key metrics exposed by PrometheusLogger:

litellm_proxy_total_requests_metric -- Total number of requests to the proxy.
litellm_proxy_failed_requests_metric -- Total number of failed requests.
litellm_request_total_latency_metric -- End-to-end request latency histogram.
litellm_llm_api_latency_metric -- LLM API call latency histogram.
litellm_llm_api_time_to_first_token_metric -- Time to first token histogram.
litellm_spend_metric -- Cumulative spend counter.
litellm_total_tokens_metric -- Total input + output tokens counter.

Usage

Use perform_health_check when implementing the /health endpoint or running background health check loops. Use PrometheusLogger by adding "prometheus" to the callbacks list in the litellm settings or proxy configuration. The Prometheus metrics are then available at the /metrics endpoint for scraping.

Code Reference

Attribute	Value
Source Location (health check)	`litellm/proxy/health_check.py`, lines 188-223
Source Location (prometheus)	`litellm/integrations/prometheus.py`, line 62
Health Check Signature	`async def perform_health_check(model_list: list, model: Optional[str] = None, cli_model: Optional[str] = None, details: Optional[bool] = True) -> Tuple[list, list]`
PrometheusLogger Class	`class PrometheusLogger(CustomLogger)`
Import (health check)	`from litellm.proxy.health_check import perform_health_check`
Import (prometheus)	`from litellm.integrations.prometheus import PrometheusLogger`

I/O Contract

Inputs (perform_health_check)

Parameter	Type	Description
`model_list`	`list`	List of model deployment dictionaries, each containing `model_name`, `litellm_params`, and `model_info`.
`model`	`Optional[str]`	If provided, only check deployments matching this model name. Matches against both `litellm_params.model` and `model_name`.
`cli_model`	`Optional[str]`	If `model_list` is empty and `cli_model` is set, creates a single-model list for health checking.
`details`	`Optional[bool]`	If `True` (default), returns full deployment details. If `False`, returns only minimal display parameters (`model` and `mode_error`).

Outputs (perform_health_check)

Return Element	Type	Description
`healthy_endpoints`	`list`	List of deployment data dictionaries for models that responded successfully within the timeout.
`unhealthy_endpoints`	`list`	List of deployment data dictionaries for models that failed or timed out, including error details.

PrometheusLogger Key Metrics

Metric Name	Type	Description
`litellm_proxy_total_requests_metric`	Counter	Total requests made to the proxy server.
`litellm_proxy_failed_requests_metric`	Counter	Total failed responses from the proxy.
`litellm_request_total_latency_metric`	Histogram	Total latency (seconds) for a request to LiteLLM, with configurable bucket boundaries.
`litellm_llm_api_latency_metric`	Histogram	Total latency (seconds) for the LLM provider API call.
`litellm_llm_api_time_to_first_token_metric`	Histogram	Time to first token for streaming LLM API calls.
`litellm_spend_metric`	Counter	Cumulative spend on LLM requests.
`litellm_total_tokens_metric`	Counter	Total number of input + output tokens.

Usage Examples

Running a health check programmatically:

from litellm.proxy.health_check import perform_health_check

# Assuming model_list is loaded from proxy config
model_list = [
    {
        "model_name": "gpt-4",
        "litellm_params": {"model": "openai/gpt-4", "api_key": "sk-..."},
        "model_info": {"id": "model-1", "mode": "chat"},
    },
    {
        "model_name": "claude-3",
        "litellm_params": {"model": "anthropic/claude-3-opus-20240229", "api_key": "sk-ant-..."},
        "model_info": {"id": "model-2", "mode": "chat"},
    },
]

healthy, unhealthy = await perform_health_check(model_list=model_list)
print(f"Healthy: {len(healthy)}, Unhealthy: {len(unhealthy)}")

Checking a specific model:

healthy, unhealthy = await perform_health_check(
    model_list=model_list,
    model="gpt-4",
    details=True
)

Enabling Prometheus metrics in proxy configuration:

# config.yaml
litellm_settings:
  callbacks: ["prometheus"]
  success_callback: ["prometheus"]
  failure_callback: ["prometheus"]

Querying the health endpoint via curl:

# Check all models
curl http://localhost:4000/health

# Check a specific model
curl "http://localhost:4000/health?model=gpt-4"

Scraping Prometheus metrics:

# Prometheus metrics endpoint
curl http://localhost:4000/metrics

Example Prometheus/Grafana query for request latency:

# P99 latency by model over 5 minutes
histogram_quantile(0.99,
  rate(litellm_request_total_latency_metric_bucket[5m])
) by (model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment