Implementation:Openai Openai node Transcriptions Create
| Knowledge Sources | |
|---|---|
| Domains | Audio, Speech_Recognition |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for transcribing audio files to text provided by the openai-node SDK.
Description
The Transcriptions.create() method has 6 overloads supporting different combinations of response formats and streaming modes. It uploads an audio file via multipart form to /audio/transcriptions and returns the transcription in the requested format. Streaming mode returns a Stream<TranscriptionStreamEvent> for real-time processing.
Usage
Use this method to transcribe audio files. Choose the response format based on your needs: json for structured data, text for plain text, srt/vtt for subtitles, or verbose_json for detailed timing information.
Code Reference
Source Location
- Repository: openai-node
- File: src/resources/audio/transcriptions.ts
- Lines: L12-63 (class with overloads), L634-742 (TranscriptionCreateParamsBase)
Signature
class Transcriptions extends APIResource {
// Non-streaming, JSON format
create(
body: TranscriptionCreateParamsNonStreaming & { response_format?: 'json' },
options?: RequestOptions,
): APIPromise<Transcription>;
// Non-streaming, verbose JSON
create(
body: TranscriptionCreateParamsNonStreaming & { response_format: 'verbose_json' },
options?: RequestOptions,
): APIPromise<TranscriptionVerbose>;
// Non-streaming, text/srt/vtt
create(
body: TranscriptionCreateParamsNonStreaming & { response_format: 'text' | 'srt' | 'vtt' },
options?: RequestOptions,
): APIPromise<string>;
// Streaming
create(
body: TranscriptionCreateParamsStreaming,
options?: RequestOptions,
): APIPromise<Stream<TranscriptionStreamEvent>>;
}
interface TranscriptionCreateParamsBase {
file: Uploadable;
model: string | AudioModel;
language?: string;
prompt?: string;
response_format?: 'json' | 'text' | 'srt' | 'vtt' | 'verbose_json';
stream?: boolean;
temperature?: number;
timestamp_granularities?: Array<'word' | 'segment'>;
}
Import
import OpenAI from 'openai';
// Access via: client.audio.transcriptions.create(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file | Uploadable | Yes | Audio file to transcribe |
| model | AudioModel | Yes | Model ('whisper-1', 'gpt-4o-transcribe', etc.) |
| language | string | No | ISO-639-1 language code |
| prompt | string | No | Context hint for recognition |
| response_format | string | No (default 'json') | Output format |
| stream | boolean | No | Enable streaming transcription |
| temperature | number | No | Sampling temperature |
| timestamp_granularities | Array | No | Timestamp detail level |
Outputs
| Name | Type | Description |
|---|---|---|
| (json) | Transcription | { text: string } |
| (verbose_json) | TranscriptionVerbose | { text, language, duration, words?, segments? } |
| (text/srt/vtt) | string | Plain text or subtitle format string |
| (streaming) | Stream<TranscriptionStreamEvent> | Real-time transcription events |
Usage Examples
Basic Transcription
import OpenAI, { toFile } from 'openai';
import fs from 'fs';
const client = new OpenAI();
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('audio.mp3'),
model: 'whisper-1',
});
console.log(transcription.text);
Verbose with Timestamps
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('audio.mp3'),
model: 'whisper-1',
response_format: 'verbose_json',
timestamp_granularities: ['word', 'segment'],
});
console.log('Duration:', transcription.duration);
for (const word of transcription.words || []) {
console.log(`${word.start}-${word.end}: ${word.word}`);
}