Implementation:Openai Openai node Transcriptions Resource
| Knowledge Sources | |
|---|---|
| Domains | SDK, Audio, Transcription |
| Last Updated | 2026-02-15 12:00 GMT |
Overview
The Transcriptions resource class provides the create method for transcribing audio files into text, supporting multiple output formats, streaming, and diarization.
Description
Transcriptions extends APIResource and exposes a single method, create, with multiple overload signatures that control the return type based on the response_format and stream parameters. When response_format is 'json' or omitted, it returns a Transcription object; for 'verbose_json', a TranscriptionVerbose with segments and word timestamps; for 'srt', 'vtt', or 'text', a plain string. When stream: true is set, it returns a Stream<TranscriptionStreamEvent> that yields real-time delta and done events.
The module defines a comprehensive type system for transcription results. The Transcription interface includes the transcribed text along with optional log probabilities and usage statistics (which can be token-based or duration-based). The TranscriptionVerbose interface adds duration, language, segments, and optional word-level timestamps. For diarized transcription (using gpt-4o-transcribe-diarize), TranscriptionDiarized provides speaker-annotated segments.
The streaming event types include TranscriptionTextDeltaEvent (incremental text deltas with optional log probabilities), TranscriptionTextDoneEvent (the final complete text with usage), and TranscriptionTextSegmentEvent (diarized segments with speaker labels). The create method sends a multipart form request to /audio/transcriptions.
Usage
This resource is accessed as client.audio.transcriptions.create(). It accepts an audio file (as a ReadStream, File, or other Uploadable) and a model identifier. Supported models include gpt-4o-transcribe, gpt-4o-mini-transcribe, whisper-1, and gpt-4o-transcribe-diarize.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/audio/transcriptions.ts
- Lines: 1-823
Signature
export class Transcriptions extends APIResource {
// Non-streaming, JSON format (default)
create(
body: TranscriptionCreateParamsNonStreaming<'json' | undefined>,
options?: RequestOptions,
): APIPromise<Transcription>;
// Non-streaming, verbose JSON format
create(
body: TranscriptionCreateParamsNonStreaming<'verbose_json'>,
options?: RequestOptions,
): APIPromise<TranscriptionVerbose>;
// Non-streaming, plain text formats
create(
body: TranscriptionCreateParamsNonStreaming<'srt' | 'vtt' | 'text'>,
options?: RequestOptions,
): APIPromise<string>;
// Streaming
create(
body: TranscriptionCreateParamsStreaming,
options?: RequestOptions,
): APIPromise<Stream<TranscriptionStreamEvent>>;
}
Import
import OpenAI from 'openai';
// Accessed via: client.audio.transcriptions.create(...)
I/O Contract
Inputs (TranscriptionCreateParamsBase)
| Name | Type | Required | Description |
|---|---|---|---|
| file | Uploadable |
Yes | Audio file to transcribe (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm) |
| model | AudioModel | Yes | Model ID: 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe', 'whisper-1', or 'gpt-4o-transcribe-diarize'
|
| language | string |
No | Input audio language in ISO-639-1 format (e.g., 'en')
|
| prompt | string |
No | Optional text to guide the model's style or continue a segment |
| response_format | 'text' | 'srt' | 'vtt' | 'verbose_json' | 'diarized_json' | No | Output format |
| stream | boolean |
No | Enable SSE streaming of transcription events |
| temperature | number |
No | Sampling temperature between 0 and 1 |
| timestamp_granularities | 'segment'> | No | Granularity of timestamps (requires verbose_json) |
| include | Array<'logprobs'> |
No | Additional data to include in the response |
| chunking_strategy | VadConfig | null | No | Audio chunking strategy using voice activity detection |
| known_speaker_names | Array<string> |
No | Speaker name labels for diarization (up to 4) |
| known_speaker_references | Array<string> |
No | Audio sample data URLs matching speaker names |
Outputs
| Condition | Return Type | Description |
|---|---|---|
Default / response_format: 'json' |
Transcription |
Object with text, optional logprobs and usage
|
response_format: 'verbose_json' |
TranscriptionVerbose |
Includes duration, language, segments, words
|
| 'srt' | 'vtt' | string |
Plain text output |
stream: true |
Stream<TranscriptionStreamEvent> |
Stream of delta and done events |
Usage Examples
Basic Usage
import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI();
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('speech.mp3'),
model: 'gpt-4o-transcribe',
});
console.log(transcription.text);
Streaming Transcription
const stream = await client.audio.transcriptions.create({
file: fs.createReadStream('interview.mp3'),
model: 'gpt-4o-transcribe',
stream: true,
});
for await (const event of stream) {
if (event.type === 'transcript.text.delta') {
process.stdout.write(event.delta);
}
}
Verbose JSON with Timestamps
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('lecture.mp3'),
model: 'whisper-1',
response_format: 'verbose_json',
timestamp_granularities: ['word', 'segment'],
});
for (const word of transcription.words ?? []) {
console.log(`${word.start}s - ${word.end}s: ${word.word}`);
}