Implementation:Openai Openai node Transcriptions Resource

Knowledge Sources	Openai_Openai_node
Domains	SDK, Audio, Transcription
Last Updated	2026-02-15 12:00 GMT

Overview

The Transcriptions resource class provides the create method for transcribing audio files into text, supporting multiple output formats, streaming, and diarization.

Description

Transcriptions extends APIResource and exposes a single method, create, with multiple overload signatures that control the return type based on the response_format and stream parameters. When response_format is 'json' or omitted, it returns a Transcription object; for 'verbose_json', a TranscriptionVerbose with segments and word timestamps; for 'srt', 'vtt', or 'text', a plain string. When stream: true is set, it returns a Stream<TranscriptionStreamEvent> that yields real-time delta and done events.

The module defines a comprehensive type system for transcription results. The Transcription interface includes the transcribed text along with optional log probabilities and usage statistics (which can be token-based or duration-based). The TranscriptionVerbose interface adds duration, language, segments, and optional word-level timestamps. For diarized transcription (using gpt-4o-transcribe-diarize), TranscriptionDiarized provides speaker-annotated segments.

The streaming event types include TranscriptionTextDeltaEvent (incremental text deltas with optional log probabilities), TranscriptionTextDoneEvent (the final complete text with usage), and TranscriptionTextSegmentEvent (diarized segments with speaker labels). The create method sends a multipart form request to /audio/transcriptions.

Usage

This resource is accessed as client.audio.transcriptions.create(). It accepts an audio file (as a ReadStream, File, or other Uploadable) and a model identifier. Supported models include gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, and gpt-4o-transcribe-diarize.

Code Reference

Source Location

Repository: openai-node
File: src/resources/audio/transcriptions.ts
Lines: 1-823

Signature

export class Transcriptions extends APIResource {
  // Non-streaming, JSON format (default)
  create(
    body: TranscriptionCreateParamsNonStreaming<'json' | undefined>,
    options?: RequestOptions,
  ): APIPromise<Transcription>;

  // Non-streaming, verbose JSON format
  create(
    body: TranscriptionCreateParamsNonStreaming<'verbose_json'>,
    options?: RequestOptions,
  ): APIPromise<TranscriptionVerbose>;

  // Non-streaming, plain text formats
  create(
    body: TranscriptionCreateParamsNonStreaming<'srt' | 'vtt' | 'text'>,
    options?: RequestOptions,
  ): APIPromise<string>;

  // Streaming
  create(
    body: TranscriptionCreateParamsStreaming,
    options?: RequestOptions,
  ): APIPromise<Stream<TranscriptionStreamEvent>>;
}

Import

import OpenAI from 'openai';
// Accessed via: client.audio.transcriptions.create(...)

I/O Contract

Inputs (TranscriptionCreateParamsBase)

Name	Type	Required	Description
file	`Uploadable`	Yes	Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm)
model	AudioModel	Yes	Model ID: `'gpt-4o-transcribe'`, `'gpt-4o-mini-transcribe'`, `'whisper-1'`, or `'gpt-4o-transcribe-diarize'`
language	`string`	No	Input audio language in ISO-639-1 format (e.g., `'en'`)
prompt	`string`	No	Optional text to guide the model's style or continue a segment
response_format	'text' \| 'srt' \| 'vtt' \| 'verbose_json' \| 'diarized_json'	No	Output format
stream	`boolean`	No	Enable SSE streaming of transcription events
temperature	`number`	No	Sampling temperature between 0 and 1
timestamp_granularities	'segment'>	No	Granularity of timestamps (requires verbose_json)
include	`Array<'logprobs'>`	No	Additional data to include in the response
chunking_strategy	VadConfig \| null	No	Audio chunking strategy using voice activity detection
known_speaker_names	`Array<string>`	No	Speaker name labels for diarization (up to 4)
known_speaker_references	`Array<string>`	No	Audio sample data URLs matching speaker names

Outputs

Condition	Return Type	Description
Default / `response_format: 'json'`	`Transcription`	Object with `text`, optional `logprobs` and `usage`
`response_format: 'verbose_json'`	`TranscriptionVerbose`	Includes `duration`, `language`, `segments`, `words`
'srt' \| 'vtt'	`string`	Plain text output
`stream: true`	`Stream<TranscriptionStreamEvent>`	Stream of delta and done events

Usage Examples

Basic Usage

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI();
const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('speech.mp3'),
  model: 'gpt-4o-transcribe',
});

console.log(transcription.text);

Streaming Transcription

const stream = await client.audio.transcriptions.create({
  file: fs.createReadStream('interview.mp3'),
  model: 'gpt-4o-transcribe',
  stream: true,
});

for await (const event of stream) {
  if (event.type === 'transcript.text.delta') {
    process.stdout.write(event.delta);
  }
}

Verbose JSON with Timestamps

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('lecture.mp3'),
  model: 'whisper-1',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment'],
});

for (const word of transcription.words ?? []) {
  console.log(`${word.start}s - ${word.end}s: ${word.word}`);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment