Implementation:Openai Openai node Beta Realtime TranscriptionSessions
| Knowledge Sources | |
|---|---|
| Domains | SDK, Realtime, Transcription, Beta |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The Beta Realtime TranscriptionSessions resource class provides a method to create ephemeral API tokens specifically for Realtime transcription sessions, enabling client-side audio transcription via the Realtime API.
Description
The TranscriptionSessions class extends APIResource and exposes a single create method that posts to the /realtime/transcription_sessions endpoint. This endpoint is designed specifically for realtime transcriptions rather than full conversational sessions. The method accepts TranscriptionSessionCreateParams and returns a TranscriptionSession object containing session configuration and an ephemeral client secret. The OpenAI-Beta: assistants=v2 header is automatically injected.
The TranscriptionSession response interface includes a client_secret object (with value and expires_at for the ephemeral key), input_audio_format, input_audio_transcription configuration (supporting models like gpt-4o-transcribe, gpt-4o-mini-transcribe, and whisper-1), modalities, and turn_detection settings.
The TranscriptionSessionCreateParams interface allows configuring the session with client_secret (for token expiration customization), include (to request additional fields like logprobs), input_audio_format, input_audio_noise_reduction (near_field or far_field), input_audio_transcription (model, language, prompt), modalities, and turn_detection supporting both Server VAD and Semantic VAD modes.
Usage
Use this resource to create ephemeral tokens for client-side Realtime transcription connections. Access it via client.beta.realtime.transcriptionSessions. This is useful when you need audio-to-text transcription without a full conversational Realtime session.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/beta/realtime/transcription-sessions.ts
Signature
export class TranscriptionSessions extends APIResource {
create(
body: TranscriptionSessionCreateParams,
options?: RequestOptions,
): APIPromise<TranscriptionSession>;
}
export interface TranscriptionSession {
client_secret: TranscriptionSession.ClientSecret;
input_audio_format?: string;
input_audio_transcription?: TranscriptionSession.InputAudioTranscription;
modalities?: Array<'text' | 'audio'>;
turn_detection?: TranscriptionSession.TurnDetection;
}
export interface TranscriptionSessionCreateParams {
client_secret?: TranscriptionSessionCreateParams.ClientSecret;
include?: Array<string>;
input_audio_format?: 'pcm16' | 'g711_ulaw' | 'g711_alaw';
input_audio_noise_reduction?: TranscriptionSessionCreateParams.InputAudioNoiseReduction;
input_audio_transcription?: TranscriptionSessionCreateParams.InputAudioTranscription;
modalities?: Array<'text' | 'audio'>;
turn_detection?: TranscriptionSessionCreateParams.TurnDetection;
}
Import
import OpenAI from 'openai';
// Access via client.beta.realtime.transcriptionSessions
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_audio_format | 'g711_ulaw' | 'g711_alaw' | No | Format of input audio (pcm16 requires 16-bit, 24kHz, mono, little-endian) |
| input_audio_transcription | InputAudioTranscription |
No | Transcription model, language (ISO-639-1), and prompt |
| input_audio_noise_reduction | InputAudioNoiseReduction |
No | Noise reduction type: near_field (headphones) or far_field (laptop mic) |
| modalities | 'audio'> | No | Modalities the model can respond with |
| turn_detection | TurnDetection |
No | VAD configuration (server_vad or semantic_vad) with eagerness, threshold, padding |
| include | Array<string> |
No | Additional items to include (e.g., logprobs) |
| client_secret | ClientSecret |
No | Token expiration config (anchor, seconds between 10-7200) |
Outputs
| Name | Type | Description |
|---|---|---|
| TranscriptionSession | TranscriptionSession |
Session configuration plus ephemeral client secret |
| client_secret.value | string |
Ephemeral API key for client-side WebSocket authentication |
| client_secret.expires_at | number |
Unix timestamp when the ephemeral key expires (default TTL is 10 minutes) |
Usage Examples
Basic Usage
import OpenAI from 'openai';
const client = new OpenAI();
// Create a transcription session
const transcriptionSession =
await client.beta.realtime.transcriptionSessions.create({
input_audio_format: 'pcm16',
input_audio_transcription: {
model: 'gpt-4o-transcribe',
language: 'en',
},
turn_detection: {
type: 'server_vad',
threshold: 0.5,
silence_duration_ms: 500,
},
});
// Use the ephemeral key for client-side WebSocket connection
console.log(transcriptionSession.client_secret.value);