Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai node Transcriptions Resource

From Leeroopedia
Knowledge Sources
Domains SDK, Audio, Transcription
Last Updated 2026-02-15 12:00 GMT

Overview

The Transcriptions resource class provides the create method for transcribing audio files into text, supporting multiple output formats, streaming, and diarization.

Description

Transcriptions extends APIResource and exposes a single method, create, with multiple overload signatures that control the return type based on the response_format and stream parameters. When response_format is 'json' or omitted, it returns a Transcription object; for 'verbose_json', a TranscriptionVerbose with segments and word timestamps; for 'srt', 'vtt', or 'text', a plain string. When stream: true is set, it returns a Stream<TranscriptionStreamEvent> that yields real-time delta and done events.

The module defines a comprehensive type system for transcription results. The Transcription interface includes the transcribed text along with optional log probabilities and usage statistics (which can be token-based or duration-based). The TranscriptionVerbose interface adds duration, language, segments, and optional word-level timestamps. For diarized transcription (using gpt-4o-transcribe-diarize), TranscriptionDiarized provides speaker-annotated segments.

The streaming event types include TranscriptionTextDeltaEvent (incremental text deltas with optional log probabilities), TranscriptionTextDoneEvent (the final complete text with usage), and TranscriptionTextSegmentEvent (diarized segments with speaker labels). The create method sends a multipart form request to /audio/transcriptions.

Usage

This resource is accessed as client.audio.transcriptions.create(). It accepts an audio file (as a ReadStream, File, or other Uploadable) and a model identifier. Supported models include gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, and gpt-4o-transcribe-diarize.

Code Reference

Source Location

Signature

export class Transcriptions extends APIResource {
  // Non-streaming, JSON format (default)
  create(
    body: TranscriptionCreateParamsNonStreaming<'json' | undefined>,
    options?: RequestOptions,
  ): APIPromise<Transcription>;

  // Non-streaming, verbose JSON format
  create(
    body: TranscriptionCreateParamsNonStreaming<'verbose_json'>,
    options?: RequestOptions,
  ): APIPromise<TranscriptionVerbose>;

  // Non-streaming, plain text formats
  create(
    body: TranscriptionCreateParamsNonStreaming<'srt' | 'vtt' | 'text'>,
    options?: RequestOptions,
  ): APIPromise<string>;

  // Streaming
  create(
    body: TranscriptionCreateParamsStreaming,
    options?: RequestOptions,
  ): APIPromise<Stream<TranscriptionStreamEvent>>;
}

Import

import OpenAI from 'openai';
// Accessed via: client.audio.transcriptions.create(...)

I/O Contract

Inputs (TranscriptionCreateParamsBase)

Name Type Required Description
file Uploadable Yes Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
model AudioModel Yes Model ID: 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe', 'whisper-1', or 'gpt-4o-transcribe-diarize'
language string No Input audio language in ISO-639-1 format (e.g., 'en')
prompt string No Optional text to guide the model's style or continue a segment
response_format 'text' | 'srt' | 'vtt' | 'verbose_json' | 'diarized_json' No Output format
stream boolean No Enable SSE streaming of transcription events
temperature number No Sampling temperature between 0 and 1
timestamp_granularities 'segment'> No Granularity of timestamps (requires verbose_json)
include Array<'logprobs'> No Additional data to include in the response
chunking_strategy VadConfig | null No Audio chunking strategy using voice activity detection
known_speaker_names Array<string> No Speaker name labels for diarization (up to 4)
known_speaker_references Array<string> No Audio sample data URLs matching speaker names

Outputs

Condition Return Type Description
Default / response_format: 'json' Transcription Object with text, optional logprobs and usage
response_format: 'verbose_json' TranscriptionVerbose Includes duration, language, segments, words
'srt' | 'vtt' string Plain text output
stream: true Stream<TranscriptionStreamEvent> Stream of delta and done events

Usage Examples

Basic Usage

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI();
const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('speech.mp3'),
  model: 'gpt-4o-transcribe',
});

console.log(transcription.text);

Streaming Transcription

const stream = await client.audio.transcriptions.create({
  file: fs.createReadStream('interview.mp3'),
  model: 'gpt-4o-transcribe',
  stream: true,
});

for await (const event of stream) {
  if (event.type === 'transcript.text.delta') {
    process.stdout.write(event.delta);
  }
}

Verbose JSON with Timestamps

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('lecture.mp3'),
  model: 'whisper-1',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment'],
});

for (const word of transcription.words ?? []) {
  console.log(`${word.start}s - ${word.end}s: ${word.word}`);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment