Workflow:Openai Openai node Realtime Conversation
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Realtime, WebSocket, Audio |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
End-to-end process for building low-latency, multi-modal conversational experiences using the OpenAI Realtime API over WebSocket connections.
Description
This workflow enables real-time, bidirectional communication with OpenAI models through WebSocket connections. Unlike the HTTP-based Chat Completions API, the Realtime API supports streaming text and audio input/output with minimal latency, making it suitable for voice assistants and interactive conversations. The SDK provides two transport implementations: OpenAIRealtimeWebSocket for browser environments (using native WebSocket) and OpenAIRealtimeWS for Node.js (using the ws library). Both implementations share the same event-driven API with typed events for session management, conversation items, and response streaming.
Usage
Execute this workflow when you need low-latency, real-time conversational AI with text or audio modalities. This is ideal for voice assistants, interactive tutoring, customer service bots with voice, and any application where HTTP request-response latency is unacceptable. The Realtime API also supports function calling within real-time sessions.
Execution Steps
Step 1: Connection Establishment
Create a Realtime API client instance specifying the model. The SDK authenticates via API key and establishes a WebSocket connection to the Realtime API endpoint. For browser environments, use OpenAIRealtimeWebSocket; for Node.js, use OpenAIRealtimeWS.
Key considerations:
- Import from openai/realtime/websocket (browser) or openai/realtime/ws (Node.js)
- The model parameter selects the realtime-capable model
- Authentication is handled automatically via the API key
- For client-side use, create ephemeral tokens via client.realtime.clientSecrets.create()
- Connection events are emitted on the underlying socket object
Step 2: Session Configuration
Once connected, configure the session parameters by sending a session.update event. This sets the modalities (text, audio), voice, instructions, and other session-level settings that control the model's behavior throughout the conversation.
Key considerations:
- Set output_modalities to control response format (text, audio, or both)
- Configure voice for audio output (e.g., alloy, echo, shimmer)
- Provide instructions to set the model's behavior
- Session updates take effect immediately for subsequent responses
Step 3: Conversation Interaction
Send conversation items (messages) and trigger responses. Create message items with the appropriate type and content (text or audio), then send a response.create event to prompt the model to respond. The model streams its response as a series of delta events.
Key considerations:
- Create items with conversation.item.create events
- Items can be text messages, audio, or function call results
- Trigger model responses with response.create
- For audio, send audio buffers via input_audio_buffer.append
Step 4: Response Streaming
Listen for response events to process the model's output in real time. Text arrives as response.text.delta or response.output_text.delta events, and audio arrives as response.audio.delta events. Accumulate deltas to build the complete response.
Key considerations:
- Register event handlers with rt.on('event_type', callback)
- Text deltas contain incremental text content
- Audio deltas contain base64-encoded audio chunks
- response.done signals the end of a response
- Handle error events for connection issues
Step 5: Connection Cleanup
Close the WebSocket connection when the conversation is complete. Call rt.close() to gracefully terminate the session. Listen for the socket close event to confirm disconnection.
Key considerations:
- Always close the connection when done to free resources
- The close event confirms the connection has been terminated
- Handle unexpected disconnections in the error event handler