Heuristic:Openai Openai node Stream Usage Interruption
| Knowledge Sources | |
|---|---|
| Domains | Streaming, Observability |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
When a streaming chat completion is interrupted or cancelled, the final usage chunk containing total token counts is lost, making cost tracking unreliable for interrupted streams.
Description
The OpenAI streaming API sends token usage statistics (prompt tokens, completion tokens, total tokens) in the final chunk of the stream when `stream_options.include_usage` is enabled. If the stream is interrupted (network error, client cancellation, timeout), this final chunk is never received. This means applications that rely on streaming usage data for cost tracking or rate limiting will have incomplete data for any interrupted request.
Usage
This heuristic applies to all streaming chat completions and streaming responses where usage tracking is important. If accurate token counting is critical (billing, quota enforcement), consider implementing fallback strategies: use the tokenizer to estimate prompt tokens, count accumulated output tokens from deltas, or fall back to a non-streaming request for critical paths.
The Insight (Rule of Thumb)
- Action: Do not rely solely on stream usage data for critical cost tracking. Implement a fallback token counting strategy.
- Value: The usage chunk is always the last event in the stream. If `finish_reason` is received without a usage chunk, the stream was interrupted.
- Trade-off: Non-streaming requests always include usage data but have higher time-to-first-token latency.
- Detection: Check if the accumulated stream has a `usage` field on the final chunk. If missing, the stream was likely interrupted.
Reasoning
Streaming APIs deliver data incrementally for lower perceived latency. The server computes total usage only after the full response is generated, so it can only be sent at the end. This is an inherent limitation of the streaming protocol — there is no way to send usage data before the response is complete. This behavior is documented in the SDK type definitions and affects all streaming endpoints.
Code Evidence
Documentation note in `src/resources/chat/completions/completions.ts:574`:
**NOTE:** If the stream is interrupted or cancelled, you may not
receive the final usage chunk which contains the total token
usage for the request.
Same note at `src/resources/chat/completions/completions.ts:1308`:
**NOTE:** If the stream is interrupted, you may not receive
the final usage chunk which contains the total token usage
for the request.