Principle:BerriAI Litellm Logging Payload Construction

Knowledge Sources	Domains	Last Updated
[[1]]	Observability, Data Normalization	2026-02-15

Overview

Logging payload construction is the process of assembling a standardized, provider-agnostic data structure from raw LLM API call metadata so that downstream observability integrations receive a uniform schema regardless of which provider handled the request.

Description

LLM providers return responses in vastly different formats. Some include token counts in the response body; others require separate API calls to obtain usage data. Latency must be measured at multiple points (request start, first token, request end). Costs must be calculated from model-specific pricing tables. Error details vary in structure across providers.

Logging payload construction solves this by:

Normalizing heterogeneous provider data into a single typed dictionary (StandardLoggingPayload) that every downstream consumer can rely on.
Enriching the payload with computed fields such as response cost, cost breakdowns, token counts, latency metrics, and cache hit status.
Separating lifecycle phases -- the Logging object tracks state across the entire call lifecycle (pre-call, post-call, success, failure) and constructs the final payload only when the outcome is known.
Supporting streaming aggregation -- for streaming responses, individual chunks are collected and assembled into a complete response before the payload is finalized.

Usage

Apply logging payload construction when:

An LLM API call completes (successfully or with an error) and telemetry must be dispatched to registered callbacks.
Streaming responses need to be aggregated into a single logged event.
Cost and token usage must be calculated and attached to the log entry.
Multiple observability integrations need to receive the same data in the same format.

Theoretical Basis

Normalized Payload Schema

The core idea is a canonical schema that captures everything an observability backend might need:

StandardLoggingPayload := {
    -- Identity
    id:          unique call identifier
    trace_id:    groups related calls (retries, fallbacks)
    call_type:   "completion" | "embedding" | "image_generation" | ...

    -- Performance
    startTime:           epoch float
    endTime:             epoch float
    completionStartTime: epoch float (time to first token)
    response_time:       endTime - startTime in seconds

    -- Cost
    response_cost:       float in USD
    cost_breakdown:      optional detailed per-component costs
    saved_cache_cost:    float (savings from cache hits)

    -- Tokens
    prompt_tokens:       integer
    completion_tokens:   integer
    total_tokens:        integer

    -- Model Info
    model:               string (requested model name)
    custom_llm_provider: string (provider identifier)
    model_id:            optional deployment-specific ID
    model_group:         optional model group name
    api_base:            endpoint URL

    -- Request/Response
    messages:            original input (may be redacted)
    response:            output content (may be redacted)
    model_parameters:    dict of non-default params sent

    -- Status
    status:              "success" | "failure"
    error_str:           optional error message
    error_information:   optional structured error details

    -- Metadata
    metadata:            dict of tags, user info, key info
    request_tags:        list of user-defined tags
    cache_hit:           optional boolean
    hidden_params:       internal params not exposed to users
}

Lifecycle State Machine

INIT(model, messages, stream, call_type)
  |
  v
PRE_CALL -- record start_time, store original kwargs
  |
  v
POST_CALL -- record raw response headers, initial metadata
  |
  +-- on success --> SUCCESS_HANDLER
  |                     |
  |                     v
  |                  Build StandardLoggingPayload
  |                  Calculate cost, tokens, latency
  |                  Dispatch to all success callbacks
  |
  +-- on failure --> FAILURE_HANDLER
                        |
                        v
                     Build StandardLoggingPayload (with error fields)
                     Dispatch to all failure callbacks

Streaming Aggregation

For streaming calls, the payload cannot be constructed from a single response. Instead:

1. Each chunk is appended to streaming_chunks[]
2. On stream completion, chunks are merged into a complete ModelResponse
3. The complete response is stored as complete_streaming_response
4. The StandardLoggingPayload is built from the aggregated response
5. Token counts come from the aggregated usage, not individual chunks

Related Pages

Implementation:BerriAI_Litellm_Litellm_Logging

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment