Workflow:Openai Openai python Chat Completion

Knowledge Sources	OpenAI Python SDK OpenAI API Reference OpenAI Structured Outputs Guide
Domains	LLMs, Text_Generation, API_Integration
Last Updated	2026-02-15 10:00 GMT

Overview

End-to-end process for generating text responses from OpenAI language models using the Chat Completions API, supporting standard, streaming, and structured output modes.

Description

This workflow covers the standard procedure for interacting with OpenAI's Chat Completions API through the Python SDK. It supports multiple interaction patterns: single-turn completions, multi-turn conversations, streaming token delivery, structured output parsing into Pydantic models, and function/tool calling with automatic argument deserialization. The Chat Completions API is the established standard for text generation, supported indefinitely alongside the newer Responses API.

Usage

Execute this workflow when you need to generate text from an OpenAI language model using the Chat Completions API. This applies when you have a prompt or conversation history (system/user/assistant messages) and need a model response, whether as plain text, a streamed token sequence, a structured Pydantic object, or a set of function tool calls. This is the right workflow for applications that require the well-established chat completions interface with full control over message roles and conversation history.

Execution Steps

Step 1: Client Initialization

Instantiate the OpenAI client by creating an OpenAI (sync) or AsyncOpenAI (async) object. The client reads the API key from the OPENAI_API_KEY environment variable by default, or accepts it as an explicit parameter. Optional configuration includes base URL, timeout, max retries, and custom HTTP client settings.

Key considerations:

Store API keys in environment variables, not source code
Use AsyncOpenAI for async/await patterns
Configure timeouts and retries based on your application needs

Step 2: Message Construction

Build the conversation message list as an ordered sequence of role-tagged dictionaries. Each message has a role (system, developer, user, or assistant) and content (text string or multi-modal content array). System/developer messages set the model's behavior, user messages provide input, and assistant messages represent prior model responses for multi-turn conversations.

Key considerations:

Use the developer role (newer) or system role to set instructions
For multi-modal input (vision), use content arrays with text and image_url types
Maintain conversation history for multi-turn interactions

Step 3: Completion Request

Call client.chat.completions.create() with the model name and message list. For streaming, set stream=True to receive incremental token chunks. For structured outputs, use client.chat.completions.parse() with a response_format set to a Pydantic model class. For tool calling, pass a tools list built with openai.pydantic_function_tool() from Pydantic model definitions.

Key considerations:

Choose the appropriate model (e.g., gpt-4o, gpt-5.2) based on capability needs
Structured output parsing automatically validates responses against the Pydantic schema
Tool calls return function names and arguments that your code must execute

Step 4: Response Processing

Extract the generated content from the response object. For standard completions, access completion.choices[0].message.content. For streaming, iterate over the stream and accumulate chunk.choices[0].delta.content tokens. For parsed structured outputs, access message.parsed to get the deserialized Pydantic object. For tool calls, access message.tool_calls and their parsed_arguments.

Key considerations:

Check for message.refusal when using structured outputs (model may refuse)
Handle streaming chunks that may have empty choices
For tool calls, execute the function and optionally send results back for another completion round

Step 5: Error Handling and Retries

Handle API errors using the SDK's typed exception hierarchy. APIConnectionError indicates network issues, RateLimitError (429) signals rate limiting, and other APIStatusError subclasses cover authentication, permission, and server errors. The SDK automatically retries connection errors and rate limits (2 retries by default with exponential backoff).

Key considerations:

Access error.status_code and error.response for debugging
Use response._request_id to report issues to OpenAI
Configure max_retries on the client or per-request with client.with_options()

Execution Diagram

GitHub URL

Workflow Repository