Principle:BerriAI Litellm Logging Payload Construction
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| [[1]] | Observability, Data Normalization | 2026-02-15 |
Overview
Logging payload construction is the process of assembling a standardized, provider-agnostic data structure from raw LLM API call metadata so that downstream observability integrations receive a uniform schema regardless of which provider handled the request.
Description
LLM providers return responses in vastly different formats. Some include token counts in the response body; others require separate API calls to obtain usage data. Latency must be measured at multiple points (request start, first token, request end). Costs must be calculated from model-specific pricing tables. Error details vary in structure across providers.
Logging payload construction solves this by:
- Normalizing heterogeneous provider data into a single typed dictionary (
StandardLoggingPayload) that every downstream consumer can rely on. - Enriching the payload with computed fields such as response cost, cost breakdowns, token counts, latency metrics, and cache hit status.
- Separating lifecycle phases -- the
Loggingobject tracks state across the entire call lifecycle (pre-call, post-call, success, failure) and constructs the final payload only when the outcome is known. - Supporting streaming aggregation -- for streaming responses, individual chunks are collected and assembled into a complete response before the payload is finalized.
Usage
Apply logging payload construction when:
- An LLM API call completes (successfully or with an error) and telemetry must be dispatched to registered callbacks.
- Streaming responses need to be aggregated into a single logged event.
- Cost and token usage must be calculated and attached to the log entry.
- Multiple observability integrations need to receive the same data in the same format.
Theoretical Basis
Normalized Payload Schema
The core idea is a canonical schema that captures everything an observability backend might need:
StandardLoggingPayload := {
-- Identity
id: unique call identifier
trace_id: groups related calls (retries, fallbacks)
call_type: "completion" | "embedding" | "image_generation" | ...
-- Performance
startTime: epoch float
endTime: epoch float
completionStartTime: epoch float (time to first token)
response_time: endTime - startTime in seconds
-- Cost
response_cost: float in USD
cost_breakdown: optional detailed per-component costs
saved_cache_cost: float (savings from cache hits)
-- Tokens
prompt_tokens: integer
completion_tokens: integer
total_tokens: integer
-- Model Info
model: string (requested model name)
custom_llm_provider: string (provider identifier)
model_id: optional deployment-specific ID
model_group: optional model group name
api_base: endpoint URL
-- Request/Response
messages: original input (may be redacted)
response: output content (may be redacted)
model_parameters: dict of non-default params sent
-- Status
status: "success" | "failure"
error_str: optional error message
error_information: optional structured error details
-- Metadata
metadata: dict of tags, user info, key info
request_tags: list of user-defined tags
cache_hit: optional boolean
hidden_params: internal params not exposed to users
}
Lifecycle State Machine
INIT(model, messages, stream, call_type)
|
v
PRE_CALL -- record start_time, store original kwargs
|
v
POST_CALL -- record raw response headers, initial metadata
|
+-- on success --> SUCCESS_HANDLER
| |
| v
| Build StandardLoggingPayload
| Calculate cost, tokens, latency
| Dispatch to all success callbacks
|
+-- on failure --> FAILURE_HANDLER
|
v
Build StandardLoggingPayload (with error fields)
Dispatch to all failure callbacks
Streaming Aggregation
For streaming calls, the payload cannot be constructed from a single response. Instead:
1. Each chunk is appended to streaming_chunks[]
2. On stream completion, chunks are merged into a complete ModelResponse
3. The complete response is stored as complete_streaming_response
4. The StandardLoggingPayload is built from the aggregated response
5. Token counts come from the aggregated usage, not individual chunks