Workflow:Openai Openai node Realtime Conversation

Knowledge Sources	OpenAI Node SDK Realtime API Guide OpenAI API Reference
Domains	LLMs, Realtime, WebSocket, Audio
Last Updated	2026-02-15 12:00 GMT

Overview

End-to-end process for building low-latency, multi-modal conversational experiences using the OpenAI Realtime API over WebSocket connections.

Description

This workflow enables real-time, bidirectional communication with OpenAI models through WebSocket connections. Unlike the HTTP-based Chat Completions API, the Realtime API supports streaming text and audio input/output with minimal latency, making it suitable for voice assistants and interactive conversations. The SDK provides two transport implementations: OpenAIRealtimeWebSocket for browser environments (using native WebSocket) and OpenAIRealtimeWS for Node.js (using the ws library). Both implementations share the same event-driven API with typed events for session management, conversation items, and response streaming.

Usage

Execute this workflow when you need low-latency, real-time conversational AI with text or audio modalities. This is ideal for voice assistants, interactive tutoring, customer service bots with voice, and any application where HTTP request-response latency is unacceptable. The Realtime API also supports function calling within real-time sessions.

Execution Steps

Step 1: Connection Establishment

Create a Realtime API client instance specifying the model. The SDK authenticates via API key and establishes a WebSocket connection to the Realtime API endpoint. For browser environments, use OpenAIRealtimeWebSocket; for Node.js, use OpenAIRealtimeWS.

Key considerations:

Import from openai/realtime/websocket (browser) or openai/realtime/ws (Node.js)
The model parameter selects the realtime-capable model
Authentication is handled automatically via the API key
For client-side use, create ephemeral tokens via client.realtime.clientSecrets.create()
Connection events are emitted on the underlying socket object

Step 2: Session Configuration

Once connected, configure the session parameters by sending a session.update event. This sets the modalities (text, audio), voice, instructions, and other session-level settings that control the model's behavior throughout the conversation.

Key considerations:

Set output_modalities to control response format (text, audio, or both)
Configure voice for audio output (e.g., alloy, echo, shimmer)
Provide instructions to set the model's behavior
Session updates take effect immediately for subsequent responses

Step 3: Conversation Interaction

Send conversation items (messages) and trigger responses. Create message items with the appropriate type and content (text or audio), then send a response.create event to prompt the model to respond. The model streams its response as a series of delta events.

Key considerations:

Create items with conversation.item.create events
Items can be text messages, audio, or function call results
Trigger model responses with response.create
For audio, send audio buffers via input_audio_buffer.append

Step 4: Response Streaming

Listen for response events to process the model's output in real time. Text arrives as response.text.delta or response.output_text.delta events, and audio arrives as response.audio.delta events. Accumulate deltas to build the complete response.

Key considerations:

Register event handlers with rt.on('event_type', callback)
Text deltas contain incremental text content
Audio deltas contain base64-encoded audio chunks
response.done signals the end of a response
Handle error events for connection issues

Step 5: Connection Cleanup

Close the WebSocket connection when the conversation is complete. Call rt.close() to gracefully terminate the session. Listen for the socket close event to confirm disconnection.

Key considerations:

Always close the connection when done to free resources
The close event confirms the connection has been terminated
Handle unexpected disconnections in the error event handler

Execution Diagram

GitHub URL

Workflow Repository