Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Openai Openai node Realtime Conversation

From Leeroopedia
Knowledge Sources
Domains LLMs, Realtime, WebSocket, Audio
Last Updated 2026-02-15 12:00 GMT

Overview

End-to-end process for building low-latency, multi-modal conversational experiences using the OpenAI Realtime API over WebSocket connections.

Description

This workflow enables real-time, bidirectional communication with OpenAI models through WebSocket connections. Unlike the HTTP-based Chat Completions API, the Realtime API supports streaming text and audio input/output with minimal latency, making it suitable for voice assistants and interactive conversations. The SDK provides two transport implementations: OpenAIRealtimeWebSocket for browser environments (using native WebSocket) and OpenAIRealtimeWS for Node.js (using the ws library). Both implementations share the same event-driven API with typed events for session management, conversation items, and response streaming.

Usage

Execute this workflow when you need low-latency, real-time conversational AI with text or audio modalities. This is ideal for voice assistants, interactive tutoring, customer service bots with voice, and any application where HTTP request-response latency is unacceptable. The Realtime API also supports function calling within real-time sessions.

Execution Steps

Step 1: Connection Establishment

Create a Realtime API client instance specifying the model. The SDK authenticates via API key and establishes a WebSocket connection to the Realtime API endpoint. For browser environments, use OpenAIRealtimeWebSocket; for Node.js, use OpenAIRealtimeWS.

Key considerations:

  • Import from openai/realtime/websocket (browser) or openai/realtime/ws (Node.js)
  • The model parameter selects the realtime-capable model
  • Authentication is handled automatically via the API key
  • For client-side use, create ephemeral tokens via client.realtime.clientSecrets.create()
  • Connection events are emitted on the underlying socket object

Step 2: Session Configuration

Once connected, configure the session parameters by sending a session.update event. This sets the modalities (text, audio), voice, instructions, and other session-level settings that control the model's behavior throughout the conversation.

Key considerations:

  • Set output_modalities to control response format (text, audio, or both)
  • Configure voice for audio output (e.g., alloy, echo, shimmer)
  • Provide instructions to set the model's behavior
  • Session updates take effect immediately for subsequent responses

Step 3: Conversation Interaction

Send conversation items (messages) and trigger responses. Create message items with the appropriate type and content (text or audio), then send a response.create event to prompt the model to respond. The model streams its response as a series of delta events.

Key considerations:

  • Create items with conversation.item.create events
  • Items can be text messages, audio, or function call results
  • Trigger model responses with response.create
  • For audio, send audio buffers via input_audio_buffer.append

Step 4: Response Streaming

Listen for response events to process the model's output in real time. Text arrives as response.text.delta or response.output_text.delta events, and audio arrives as response.audio.delta events. Accumulate deltas to build the complete response.

Key considerations:

  • Register event handlers with rt.on('event_type', callback)
  • Text deltas contain incremental text content
  • Audio deltas contain base64-encoded audio chunks
  • response.done signals the end of a response
  • Handle error events for connection issues

Step 5: Connection Cleanup

Close the WebSocket connection when the conversation is complete. Call rt.close() to gracefully terminate the session. Listen for the socket close event to confirm disconnection.

Key considerations:

  • Always close the connection when done to free resources
  • The close event confirms the connection has been terminated
  • Handle unexpected disconnections in the error event handler

Execution Diagram

GitHub URL

Workflow Repository