Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Groq Groq python Streaming Usage Stats

From Leeroopedia
Knowledge Sources
Domains Streaming, API_Client
Last Updated 2026-02-15 17:00 GMT

Overview

Token usage statistics are only available on the final streaming chunk via chunk.x_groq.usage, not on intermediate chunks.

Description

When consuming a streaming chat completion response, each ChatCompletionChunk contains a choices list with delta content. However, the token usage statistics (prompt_tokens, completion_tokens, total_tokens) are only populated on the final chunk, accessed through the Groq-specific x_groq.usage field. The stream terminates when a Server-Sent Event with data [DONE] is received. The final chunk is identifiable by its non-null finish_reason field.

Usage

Apply this heuristic when building applications that need to track token consumption with streaming enabled. You must wait for the final chunk (where choices[0].finish_reason is not None) and then access chunk.x_groq.usage for the usage data. Do not attempt to read usage from intermediate chunks.

The Insight (Rule of Thumb)

  • Action: Check chunk.choices[0].finish_reason to detect the final chunk, then read chunk.x_groq.usage for token counts.
  • Value: Usage stats contain prompt_tokens, completion_tokens, total_tokens, and queue_time/prompt_time/completion_time.
  • Trade-off: Usage tracking requires consuming the entire stream. If you abort early, usage data is unavailable.
  • Stream termination: The underlying SSE stream ends with a [DONE] message, but the SDK handles this transparently; the iterator simply stops.
  • Resource cleanup: Use the stream as a context manager (with ... as stream:) to ensure the HTTP connection is properly closed.

Reasoning

Token usage cannot be known until generation completes because the number of completion tokens depends on the full output. Groq extends the standard OpenAI streaming format with the x_groq field which carries usage stats and timing metrics on the final chunk. This is important for cost monitoring, quota management, and performance analysis. The [DONE] sentinel follows the Server-Sent Events specification and signals the end of the event stream.

Code Evidence

Stream termination and [DONE] handling from src/groq/_streaming.py:58-60:

for sse in iterator:
    if sse.data.startswith("[DONE]"):
        break

Usage extraction pattern from examples/chat_completion_streaming.py:48-52:

if chunk.choices[0].finish_reason:
    # Usage information is available on the final chunk
    assert chunk.x_groq is not None
    assert chunk.x_groq.usage is not None
    print(f"\nUsage: {chunk.x_groq.usage}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment