Implementation:BerriAI Litellm Health Check
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| BerriAI/litellm repository | Health Checking, Prometheus Metrics, Observability | 2026-02-15 |
Overview
Concrete tool for monitoring proxy health and collecting operational metrics provided by the LiteLLM perform_health_check function and PrometheusLogger class.
Description
The health checking and metrics subsystem consists of two primary components:
1. perform_health_check (in health_check.py): An async function that probes the availability of configured LLM model deployments. It filters the model list by the target model (if specified), deduplicates deployments by ID, and concurrently sends lightweight test requests to each deployment using litellm.ahealth_check. Each probe runs with a configurable timeout (defaulting to HEALTH_CHECK_TIMEOUT_SECONDS). Results are classified as healthy or unhealthy endpoints, with sensitive data (API keys, credentials, messages) stripped from the output.
2. PrometheusLogger (in prometheus.py): A custom logger integration that exposes operational metrics in Prometheus format. It creates and manages counters, histograms, and gauges for request counts, latency distributions, spend tracking, token usage, and budget/rate limit remaining values. The logger registers as a LiteLLM callback and is invoked on every request completion and failure.
Key metrics exposed by PrometheusLogger:
litellm_proxy_total_requests_metric-- Total number of requests to the proxy.litellm_proxy_failed_requests_metric-- Total number of failed requests.litellm_request_total_latency_metric-- End-to-end request latency histogram.litellm_llm_api_latency_metric-- LLM API call latency histogram.litellm_llm_api_time_to_first_token_metric-- Time to first token histogram.litellm_spend_metric-- Cumulative spend counter.litellm_total_tokens_metric-- Total input + output tokens counter.
Usage
Use perform_health_check when implementing the /health endpoint or running background health check loops. Use PrometheusLogger by adding "prometheus" to the callbacks list in the litellm settings or proxy configuration. The Prometheus metrics are then available at the /metrics endpoint for scraping.
Code Reference
| Attribute | Value |
|---|---|
| Source Location (health check) | litellm/proxy/health_check.py, lines 188-223
|
| Source Location (prometheus) | litellm/integrations/prometheus.py, line 62
|
| Health Check Signature | async def perform_health_check(model_list: list, model: Optional[str] = None, cli_model: Optional[str] = None, details: Optional[bool] = True) -> Tuple[list, list]
|
| PrometheusLogger Class | class PrometheusLogger(CustomLogger)
|
| Import (health check) | from litellm.proxy.health_check import perform_health_check
|
| Import (prometheus) | from litellm.integrations.prometheus import PrometheusLogger
|
I/O Contract
Inputs (perform_health_check)
| Parameter | Type | Description |
|---|---|---|
model_list |
list |
List of model deployment dictionaries, each containing model_name, litellm_params, and model_info.
|
model |
Optional[str] |
If provided, only check deployments matching this model name. Matches against both litellm_params.model and model_name.
|
cli_model |
Optional[str] |
If model_list is empty and cli_model is set, creates a single-model list for health checking.
|
details |
Optional[bool] |
If True (default), returns full deployment details. If False, returns only minimal display parameters (model and mode_error).
|
Outputs (perform_health_check)
| Return Element | Type | Description |
|---|---|---|
healthy_endpoints |
list |
List of deployment data dictionaries for models that responded successfully within the timeout. |
unhealthy_endpoints |
list |
List of deployment data dictionaries for models that failed or timed out, including error details. |
PrometheusLogger Key Metrics
| Metric Name | Type | Description |
|---|---|---|
litellm_proxy_total_requests_metric |
Counter | Total requests made to the proxy server. |
litellm_proxy_failed_requests_metric |
Counter | Total failed responses from the proxy. |
litellm_request_total_latency_metric |
Histogram | Total latency (seconds) for a request to LiteLLM, with configurable bucket boundaries. |
litellm_llm_api_latency_metric |
Histogram | Total latency (seconds) for the LLM provider API call. |
litellm_llm_api_time_to_first_token_metric |
Histogram | Time to first token for streaming LLM API calls. |
litellm_spend_metric |
Counter | Cumulative spend on LLM requests. |
litellm_total_tokens_metric |
Counter | Total number of input + output tokens. |
Usage Examples
Running a health check programmatically:
from litellm.proxy.health_check import perform_health_check
# Assuming model_list is loaded from proxy config
model_list = [
{
"model_name": "gpt-4",
"litellm_params": {"model": "openai/gpt-4", "api_key": "sk-..."},
"model_info": {"id": "model-1", "mode": "chat"},
},
{
"model_name": "claude-3",
"litellm_params": {"model": "anthropic/claude-3-opus-20240229", "api_key": "sk-ant-..."},
"model_info": {"id": "model-2", "mode": "chat"},
},
]
healthy, unhealthy = await perform_health_check(model_list=model_list)
print(f"Healthy: {len(healthy)}, Unhealthy: {len(unhealthy)}")
Checking a specific model:
healthy, unhealthy = await perform_health_check(
model_list=model_list,
model="gpt-4",
details=True
)
Enabling Prometheus metrics in proxy configuration:
# config.yaml
litellm_settings:
callbacks: ["prometheus"]
success_callback: ["prometheus"]
failure_callback: ["prometheus"]
Querying the health endpoint via curl:
# Check all models
curl http://localhost:4000/health
# Check a specific model
curl "http://localhost:4000/health?model=gpt-4"
Scraping Prometheus metrics:
# Prometheus metrics endpoint
curl http://localhost:4000/metrics
Example Prometheus/Grafana query for request latency:
# P99 latency by model over 5 minutes
histogram_quantile(0.99,
rate(litellm_request_total_latency_metric_bucket[5m])
) by (model)