Workflow:Openai Openai python Chat Completion
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Text_Generation, API_Integration |
| Last Updated | 2026-02-15 10:00 GMT |
Overview
End-to-end process for generating text responses from OpenAI language models using the Chat Completions API, supporting standard, streaming, and structured output modes.
Description
This workflow covers the standard procedure for interacting with OpenAI's Chat Completions API through the Python SDK. It supports multiple interaction patterns: single-turn completions, multi-turn conversations, streaming token delivery, structured output parsing into Pydantic models, and function/tool calling with automatic argument deserialization. The Chat Completions API is the established standard for text generation, supported indefinitely alongside the newer Responses API.
Usage
Execute this workflow when you need to generate text from an OpenAI language model using the Chat Completions API. This applies when you have a prompt or conversation history (system/user/assistant messages) and need a model response, whether as plain text, a streamed token sequence, a structured Pydantic object, or a set of function tool calls. This is the right workflow for applications that require the well-established chat completions interface with full control over message roles and conversation history.
Execution Steps
Step 1: Client Initialization
Instantiate the OpenAI client by creating an OpenAI (sync) or AsyncOpenAI (async) object. The client reads the API key from the OPENAI_API_KEY environment variable by default, or accepts it as an explicit parameter. Optional configuration includes base URL, timeout, max retries, and custom HTTP client settings.
Key considerations:
- Store API keys in environment variables, not source code
- Use AsyncOpenAI for async/await patterns
- Configure timeouts and retries based on your application needs
Step 2: Message Construction
Build the conversation message list as an ordered sequence of role-tagged dictionaries. Each message has a role (system, developer, user, or assistant) and content (text string or multi-modal content array). System/developer messages set the model's behavior, user messages provide input, and assistant messages represent prior model responses for multi-turn conversations.
Key considerations:
- Use the developer role (newer) or system role to set instructions
- For multi-modal input (vision), use content arrays with text and image_url types
- Maintain conversation history for multi-turn interactions
Step 3: Completion Request
Call client.chat.completions.create() with the model name and message list. For streaming, set stream=True to receive incremental token chunks. For structured outputs, use client.chat.completions.parse() with a response_format set to a Pydantic model class. For tool calling, pass a tools list built with openai.pydantic_function_tool() from Pydantic model definitions.
Key considerations:
- Choose the appropriate model (e.g., gpt-4o, gpt-5.2) based on capability needs
- Structured output parsing automatically validates responses against the Pydantic schema
- Tool calls return function names and arguments that your code must execute
Step 4: Response Processing
Extract the generated content from the response object. For standard completions, access completion.choices[0].message.content. For streaming, iterate over the stream and accumulate chunk.choices[0].delta.content tokens. For parsed structured outputs, access message.parsed to get the deserialized Pydantic object. For tool calls, access message.tool_calls and their parsed_arguments.
Key considerations:
- Check for message.refusal when using structured outputs (model may refuse)
- Handle streaming chunks that may have empty choices
- For tool calls, execute the function and optionally send results back for another completion round
Step 5: Error Handling and Retries
Handle API errors using the SDK's typed exception hierarchy. APIConnectionError indicates network issues, RateLimitError (429) signals rate limiting, and other APIStatusError subclasses cover authentication, permission, and server errors. The SDK automatically retries connection errors and rate limits (2 retries by default with exponential backoff).
Key considerations:
- Access error.status_code and error.response for debugging
- Use response._request_id to report issues to OpenAI
- Configure max_retries on the client or per-request with client.with_options()