Workflow:Openai Openai python Realtime Conversation

Knowledge Sources	OpenAI Python SDK OpenAI Realtime API Guide OpenAI Realtime API Reference
Domains	LLMs, Realtime, Audio, WebSocket
Last Updated	2026-02-15 10:00 GMT

Overview

End-to-end process for building low-latency, multi-modal conversational experiences using the OpenAI Realtime API over WebSocket connections, supporting text and audio input/output with function calling.

Description

This workflow covers the Realtime API, which enables bidirectional communication with OpenAI models over persistent WebSocket connections. Unlike the request-response pattern of the Chat Completions and Responses APIs, the Realtime API operates through client-sent and server-sent events, enabling real-time text and audio interaction. The SDK uses the websockets library for connection management and provides typed event models, session configuration, conversation item creation, and response triggering. This workflow supports text-only, audio-only, and multi-modal conversation patterns.

Usage

Execute this workflow when building applications that require real-time, low-latency interaction with OpenAI models, such as voice assistants, interactive chatbots with audio capabilities, or any application where streaming bidirectional communication is needed. This is particularly suited for voice-based applications using push-to-talk or continuous audio input patterns.

Execution Steps

Step 1: Client Initialization

Create an AsyncOpenAI client (the Realtime API requires async). The client provides access to the realtime resource which manages WebSocket connections. Authentication uses the standard OPENAI_API_KEY environment variable. For Azure OpenAI, use AsyncAzureOpenAI with appropriate Azure-specific configuration.

Key considerations:

The Realtime API is async-only; use AsyncOpenAI
Install the openai[realtime] extra for WebSocket dependencies
For audio, install system dependencies (e.g., portaudio on macOS)

Step 2: Establish WebSocket Connection

Open a persistent WebSocket connection using async with client.realtime.connect(model="gpt-realtime") as connection. This returns an AsyncRealtimeConnection context manager that handles connection lifecycle. The connection remains open for bidirectional event exchange until explicitly closed or the context manager exits.

Key considerations:

The connection is a context manager that auto-closes on exit
Specify the realtime model (e.g., gpt-realtime)
The connection object is used for all subsequent event sending and receiving

Step 3: Configure Session

Update the session configuration by calling await connection.session.update() with desired settings. Configure output modalities (text, audio, or both), model parameters, and any session-level settings. This must be done after the connection is established but before sending conversation items.

Key considerations:

Set output_modalities to ["text"] for text-only or ["audio"] for voice
Session configuration can be updated at any point during the connection
The session type should be set to "realtime"

Step 4: Send Conversation Items

Create conversation items by calling await connection.conversation.item.create() with message content. Each item has a type (message), role (user), and content array with typed entries (input_text for text, audio data for voice). After adding items, trigger model generation with await connection.response.create().

Key considerations:

Text input uses {"type": "input_text", "text": "..."} content
Audio input requires base64-encoded audio data in the appropriate format
Call response.create() explicitly to trigger model response generation

Step 5: Process Server Events

Iterate over incoming server events using async for event in connection. Events are strongly typed and include text deltas (response.output_text.delta), text completion (response.output_text.done), audio data, response completion (response.done), and error events. Handle each event type according to your application's needs.

Key considerations:

Error events do NOT raise exceptions; handle event.type == "error" explicitly
Break from the event loop on response.done to send the next user message
For audio, decode and play audio chunks as they arrive
The connection stays open between response cycles for multi-turn conversation

Step 6: Error Handling

Handle errors at two levels: connection-level errors (WebSocket failures) and API-level error events. The Realtime API sends error events as regular server events with event.type == "error", containing error type, code, event ID, and message. The connection remains open after error events, allowing recovery without reconnection.

Key considerations:

Error events contain error.type, error.code, error.event_id, and error.message
The connection remains usable after error events
Network-level disconnections require re-establishing the WebSocket connection

Execution Diagram

GitHub URL

Workflow Repository