Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai node Beta Realtime TranscriptionSessions

From Leeroopedia
Knowledge Sources
Domains SDK, Realtime, Transcription, Beta
Last Updated 2026-02-15 12:00 GMT

Overview

The Beta Realtime TranscriptionSessions resource class provides a method to create ephemeral API tokens specifically for Realtime transcription sessions, enabling client-side audio transcription via the Realtime API.

Description

The TranscriptionSessions class extends APIResource and exposes a single create method that posts to the /realtime/transcription_sessions endpoint. This endpoint is designed specifically for realtime transcriptions rather than full conversational sessions. The method accepts TranscriptionSessionCreateParams and returns a TranscriptionSession object containing session configuration and an ephemeral client secret. The OpenAI-Beta: assistants=v2 header is automatically injected.

The TranscriptionSession response interface includes a client_secret object (with value and expires_at for the ephemeral key), input_audio_format, input_audio_transcription configuration (supporting models like gpt-4o-transcribe, gpt-4o-mini-transcribe, and whisper-1), modalities, and turn_detection settings.

The TranscriptionSessionCreateParams interface allows configuring the session with client_secret (for token expiration customization), include (to request additional fields like logprobs), input_audio_format, input_audio_noise_reduction (near_field or far_field), input_audio_transcription (model, language, prompt), modalities, and turn_detection supporting both Server VAD and Semantic VAD modes.

Usage

Use this resource to create ephemeral tokens for client-side Realtime transcription connections. Access it via client.beta.realtime.transcriptionSessions. This is useful when you need audio-to-text transcription without a full conversational Realtime session.

Code Reference

Source Location

Signature

export class TranscriptionSessions extends APIResource {
  create(
    body: TranscriptionSessionCreateParams,
    options?: RequestOptions,
  ): APIPromise<TranscriptionSession>;
}

export interface TranscriptionSession {
  client_secret: TranscriptionSession.ClientSecret;
  input_audio_format?: string;
  input_audio_transcription?: TranscriptionSession.InputAudioTranscription;
  modalities?: Array<'text' | 'audio'>;
  turn_detection?: TranscriptionSession.TurnDetection;
}

export interface TranscriptionSessionCreateParams {
  client_secret?: TranscriptionSessionCreateParams.ClientSecret;
  include?: Array<string>;
  input_audio_format?: 'pcm16' | 'g711_ulaw' | 'g711_alaw';
  input_audio_noise_reduction?: TranscriptionSessionCreateParams.InputAudioNoiseReduction;
  input_audio_transcription?: TranscriptionSessionCreateParams.InputAudioTranscription;
  modalities?: Array<'text' | 'audio'>;
  turn_detection?: TranscriptionSessionCreateParams.TurnDetection;
}

Import

import OpenAI from 'openai';
// Access via client.beta.realtime.transcriptionSessions

I/O Contract

Inputs

Name Type Required Description
input_audio_format 'g711_ulaw' | 'g711_alaw' No Format of input audio (pcm16 requires 16-bit, 24kHz, mono, little-endian)
input_audio_transcription InputAudioTranscription No Transcription model, language (ISO-639-1), and prompt
input_audio_noise_reduction InputAudioNoiseReduction No Noise reduction type: near_field (headphones) or far_field (laptop mic)
modalities 'audio'> No Modalities the model can respond with
turn_detection TurnDetection No VAD configuration (server_vad or semantic_vad) with eagerness, threshold, padding
include Array<string> No Additional items to include (e.g., logprobs)
client_secret ClientSecret No Token expiration config (anchor, seconds between 10-7200)

Outputs

Name Type Description
TranscriptionSession TranscriptionSession Session configuration plus ephemeral client secret
client_secret.value string Ephemeral API key for client-side WebSocket authentication
client_secret.expires_at number Unix timestamp when the ephemeral key expires (default TTL is 10 minutes)

Usage Examples

Basic Usage

import OpenAI from 'openai';

const client = new OpenAI();

// Create a transcription session
const transcriptionSession =
  await client.beta.realtime.transcriptionSessions.create({
    input_audio_format: 'pcm16',
    input_audio_transcription: {
      model: 'gpt-4o-transcribe',
      language: 'en',
    },
    turn_detection: {
      type: 'server_vad',
      threshold: 0.5,
      silence_duration_ms: 500,
    },
  });

// Use the ephemeral key for client-side WebSocket connection
console.log(transcriptionSession.client_secret.value);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment