Principle:Openai Openai python Chat Response Processing
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Generation |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A data extraction pattern for consuming language model outputs from both complete responses and incremental streaming chunks.
Description
Response processing handles the extraction of generated content from the API's response objects. For non-streaming requests, the ChatCompletion object contains complete choices with message content, tool calls, and usage statistics. For streaming requests, a Stream iterator yields ChatCompletionChunk objects with incremental deltas that must be accumulated.
Key extraction points include: text content from choices[0].message.content, tool calls from choices[0].message.tool_calls, finish reason indicating why generation stopped, and token usage for cost tracking.
Usage
Use this principle after every Chat Completion request to extract the generated content. Choose the appropriate access pattern based on whether streaming was enabled. Always check finish_reason to determine if the response was complete or truncated.
Theoretical Basis
Response processing follows two patterns:
Non-streaming (complete response):
# Direct attribute access on typed response object
content = response.choices[0].message.content
tool_calls = response.choices[0].message.tool_calls
finish_reason = response.choices[0].finish_reason # "stop", "length", "tool_calls"
tokens_used = response.usage.total_tokens
Streaming (incremental chunks):
# Accumulate deltas from chunk iterator
full_content = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
full_content += delta.content