Workflow:Openai Openai python Realtime Conversation
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Realtime, Audio, WebSocket |
| Last Updated | 2026-02-15 10:00 GMT |
Overview
End-to-end process for building low-latency, multi-modal conversational experiences using the OpenAI Realtime API over WebSocket connections, supporting text and audio input/output with function calling.
Description
This workflow covers the Realtime API, which enables bidirectional communication with OpenAI models over persistent WebSocket connections. Unlike the request-response pattern of the Chat Completions and Responses APIs, the Realtime API operates through client-sent and server-sent events, enabling real-time text and audio interaction. The SDK uses the websockets library for connection management and provides typed event models, session configuration, conversation item creation, and response triggering. This workflow supports text-only, audio-only, and multi-modal conversation patterns.
Usage
Execute this workflow when building applications that require real-time, low-latency interaction with OpenAI models, such as voice assistants, interactive chatbots with audio capabilities, or any application where streaming bidirectional communication is needed. This is particularly suited for voice-based applications using push-to-talk or continuous audio input patterns.
Execution Steps
Step 1: Client Initialization
Create an AsyncOpenAI client (the Realtime API requires async). The client provides access to the realtime resource which manages WebSocket connections. Authentication uses the standard OPENAI_API_KEY environment variable. For Azure OpenAI, use AsyncAzureOpenAI with appropriate Azure-specific configuration.
Key considerations:
- The Realtime API is async-only; use AsyncOpenAI
- Install the openai[realtime] extra for WebSocket dependencies
- For audio, install system dependencies (e.g., portaudio on macOS)
Step 2: Establish WebSocket Connection
Open a persistent WebSocket connection using async with client.realtime.connect(model="gpt-realtime") as connection. This returns an AsyncRealtimeConnection context manager that handles connection lifecycle. The connection remains open for bidirectional event exchange until explicitly closed or the context manager exits.
Key considerations:
- The connection is a context manager that auto-closes on exit
- Specify the realtime model (e.g., gpt-realtime)
- The connection object is used for all subsequent event sending and receiving
Step 3: Configure Session
Update the session configuration by calling await connection.session.update() with desired settings. Configure output modalities (text, audio, or both), model parameters, and any session-level settings. This must be done after the connection is established but before sending conversation items.
Key considerations:
- Set output_modalities to ["text"] for text-only or ["audio"] for voice
- Session configuration can be updated at any point during the connection
- The session type should be set to "realtime"
Step 4: Send Conversation Items
Create conversation items by calling await connection.conversation.item.create() with message content. Each item has a type (message), role (user), and content array with typed entries (input_text for text, audio data for voice). After adding items, trigger model generation with await connection.response.create().
Key considerations:
- Text input uses {"type": "input_text", "text": "..."} content
- Audio input requires base64-encoded audio data in the appropriate format
- Call response.create() explicitly to trigger model response generation
Step 5: Process Server Events
Iterate over incoming server events using async for event in connection. Events are strongly typed and include text deltas (response.output_text.delta), text completion (response.output_text.done), audio data, response completion (response.done), and error events. Handle each event type according to your application's needs.
Key considerations:
- Error events do NOT raise exceptions; handle event.type == "error" explicitly
- Break from the event loop on response.done to send the next user message
- For audio, decode and play audio chunks as they arrive
- The connection stays open between response cycles for multi-turn conversation
Step 6: Error Handling
Handle errors at two levels: connection-level errors (WebSocket failures) and API-level error events. The Realtime API sends error events as regular server events with event.type == "error", containing error type, code, event ID, and message. The connection remains open after error events, allowing recovery without reconnection.
Key considerations:
- Error events contain error.type, error.code, error.event_id, and error.message
- The connection remains usable after error events
- Network-level disconnections require re-establishing the WebSocket connection