Heuristic:Groq Groq python Streaming Usage Stats
| Knowledge Sources | |
|---|---|
| Domains | Streaming, API_Client |
| Last Updated | 2026-02-15 17:00 GMT |
Overview
Token usage statistics are only available on the final streaming chunk via chunk.x_groq.usage, not on intermediate chunks.
Description
When consuming a streaming chat completion response, each ChatCompletionChunk contains a choices list with delta content. However, the token usage statistics (prompt_tokens, completion_tokens, total_tokens) are only populated on the final chunk, accessed through the Groq-specific x_groq.usage field. The stream terminates when a Server-Sent Event with data [DONE] is received. The final chunk is identifiable by its non-null finish_reason field.
Usage
Apply this heuristic when building applications that need to track token consumption with streaming enabled. You must wait for the final chunk (where choices[0].finish_reason is not None) and then access chunk.x_groq.usage for the usage data. Do not attempt to read usage from intermediate chunks.
The Insight (Rule of Thumb)
- Action: Check
chunk.choices[0].finish_reasonto detect the final chunk, then readchunk.x_groq.usagefor token counts. - Value: Usage stats contain
prompt_tokens,completion_tokens,total_tokens, andqueue_time/prompt_time/completion_time. - Trade-off: Usage tracking requires consuming the entire stream. If you abort early, usage data is unavailable.
- Stream termination: The underlying SSE stream ends with a
[DONE]message, but the SDK handles this transparently; the iterator simply stops. - Resource cleanup: Use the stream as a context manager (
with ... as stream:) to ensure the HTTP connection is properly closed.
Reasoning
Token usage cannot be known until generation completes because the number of completion tokens depends on the full output. Groq extends the standard OpenAI streaming format with the x_groq field which carries usage stats and timing metrics on the final chunk. This is important for cost monitoring, quota management, and performance analysis. The [DONE] sentinel follows the Server-Sent Events specification and signals the end of the event stream.
Code Evidence
Stream termination and [DONE] handling from src/groq/_streaming.py:58-60:
for sse in iterator:
if sse.data.startswith("[DONE]"):
break
Usage extraction pattern from examples/chat_completion_streaming.py:48-52:
if chunk.choices[0].finish_reason:
# Usage information is available on the final chunk
assert chunk.x_groq is not None
assert chunk.x_groq.usage is not None
print(f"\nUsage: {chunk.x_groq.usage}")