Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:BerriAI Litellm Logging Payload Construction

From Leeroopedia
Knowledge Sources Domains Last Updated
[[1]] Observability, Data Normalization 2026-02-15

Overview

Logging payload construction is the process of assembling a standardized, provider-agnostic data structure from raw LLM API call metadata so that downstream observability integrations receive a uniform schema regardless of which provider handled the request.

Description

LLM providers return responses in vastly different formats. Some include token counts in the response body; others require separate API calls to obtain usage data. Latency must be measured at multiple points (request start, first token, request end). Costs must be calculated from model-specific pricing tables. Error details vary in structure across providers.

Logging payload construction solves this by:

  • Normalizing heterogeneous provider data into a single typed dictionary (StandardLoggingPayload) that every downstream consumer can rely on.
  • Enriching the payload with computed fields such as response cost, cost breakdowns, token counts, latency metrics, and cache hit status.
  • Separating lifecycle phases -- the Logging object tracks state across the entire call lifecycle (pre-call, post-call, success, failure) and constructs the final payload only when the outcome is known.
  • Supporting streaming aggregation -- for streaming responses, individual chunks are collected and assembled into a complete response before the payload is finalized.

Usage

Apply logging payload construction when:

  • An LLM API call completes (successfully or with an error) and telemetry must be dispatched to registered callbacks.
  • Streaming responses need to be aggregated into a single logged event.
  • Cost and token usage must be calculated and attached to the log entry.
  • Multiple observability integrations need to receive the same data in the same format.

Theoretical Basis

Normalized Payload Schema

The core idea is a canonical schema that captures everything an observability backend might need:

StandardLoggingPayload := {
    -- Identity
    id:          unique call identifier
    trace_id:    groups related calls (retries, fallbacks)
    call_type:   "completion" | "embedding" | "image_generation" | ...

    -- Performance
    startTime:           epoch float
    endTime:             epoch float
    completionStartTime: epoch float (time to first token)
    response_time:       endTime - startTime in seconds

    -- Cost
    response_cost:       float in USD
    cost_breakdown:      optional detailed per-component costs
    saved_cache_cost:    float (savings from cache hits)

    -- Tokens
    prompt_tokens:       integer
    completion_tokens:   integer
    total_tokens:        integer

    -- Model Info
    model:               string (requested model name)
    custom_llm_provider: string (provider identifier)
    model_id:            optional deployment-specific ID
    model_group:         optional model group name
    api_base:            endpoint URL

    -- Request/Response
    messages:            original input (may be redacted)
    response:            output content (may be redacted)
    model_parameters:    dict of non-default params sent

    -- Status
    status:              "success" | "failure"
    error_str:           optional error message
    error_information:   optional structured error details

    -- Metadata
    metadata:            dict of tags, user info, key info
    request_tags:        list of user-defined tags
    cache_hit:           optional boolean
    hidden_params:       internal params not exposed to users
}

Lifecycle State Machine

INIT(model, messages, stream, call_type)
  |
  v
PRE_CALL -- record start_time, store original kwargs
  |
  v
POST_CALL -- record raw response headers, initial metadata
  |
  +-- on success --> SUCCESS_HANDLER
  |                     |
  |                     v
  |                  Build StandardLoggingPayload
  |                  Calculate cost, tokens, latency
  |                  Dispatch to all success callbacks
  |
  +-- on failure --> FAILURE_HANDLER
                        |
                        v
                     Build StandardLoggingPayload (with error fields)
                     Dispatch to all failure callbacks

Streaming Aggregation

For streaming calls, the payload cannot be constructed from a single response. Instead:

1. Each chunk is appended to streaming_chunks[]
2. On stream completion, chunks are merged into a complete ModelResponse
3. The complete response is stored as complete_streaming_response
4. The StandardLoggingPayload is built from the aggregated response
5. Token counts come from the aggregated usage, not individual chunks

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment