Implementation:Openai Openai node Transcriptions Create

Knowledge Sources	openai-node OpenAI Transcription API
Domains	Audio, Speech_Recognition
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for transcribing audio files to text provided by the openai-node SDK.

Description

The Transcriptions.create() method has 6 overloads supporting different combinations of response formats and streaming modes. It uploads an audio file via multipart form to /audio/transcriptions and returns the transcription in the requested format. Streaming mode returns a Stream<TranscriptionStreamEvent> for real-time processing.

Usage

Use this method to transcribe audio files. Choose the response format based on your needs: json for structured data, text for plain text, srt/vtt for subtitles, or verbose_json for detailed timing information.

Code Reference

Source Location

Repository: openai-node
File: src/resources/audio/transcriptions.ts
Lines: L12-63 (class with overloads), L634-742 (TranscriptionCreateParamsBase)

Signature

class Transcriptions extends APIResource {
  // Non-streaming, JSON format
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format?: 'json' },
    options?: RequestOptions,
  ): APIPromise<Transcription>;

  // Non-streaming, verbose JSON
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format: 'verbose_json' },
    options?: RequestOptions,
  ): APIPromise<TranscriptionVerbose>;

  // Non-streaming, text/srt/vtt
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format: 'text' | 'srt' | 'vtt' },
    options?: RequestOptions,
  ): APIPromise<string>;

  // Streaming
  create(
    body: TranscriptionCreateParamsStreaming,
    options?: RequestOptions,
  ): APIPromise<Stream<TranscriptionStreamEvent>>;
}

interface TranscriptionCreateParamsBase {
  file: Uploadable;
  model: string | AudioModel;
  language?: string;
  prompt?: string;
  response_format?: 'json' | 'text' | 'srt' | 'vtt' | 'verbose_json';
  stream?: boolean;
  temperature?: number;
  timestamp_granularities?: Array<'word' | 'segment'>;
}

Import

import OpenAI from 'openai';
// Access via: client.audio.transcriptions.create(...)

I/O Contract

Inputs

Name	Type	Required	Description
file	Uploadable	Yes	Audio file to transcribe
model	AudioModel	Yes	Model ('whisper-1', 'gpt-4o-transcribe', etc.)
language	string	No	ISO-639-1 language code
prompt	string	No	Context hint for recognition
response_format	string	No (default 'json')	Output format
stream	boolean	No	Enable streaming transcription
temperature	number	No	Sampling temperature
timestamp_granularities	Array	No	Timestamp detail level

Outputs

Name	Type	Description
(json)	Transcription	{ text: string }
(verbose_json)	TranscriptionVerbose	{ text, language, duration, words?, segments? }
(text/srt/vtt)	string	Plain text or subtitle format string
(streaming)	Stream<TranscriptionStreamEvent>	Real-time transcription events

Usage Examples

Basic Transcription

import OpenAI, { toFile } from 'openai';
import fs from 'fs';

const client = new OpenAI();

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
});

console.log(transcription.text);

Verbose with Timestamps

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment'],
});

console.log('Duration:', transcription.duration);
for (const word of transcription.words || []) {
  console.log(`${word.start}-${word.end}: ${word.word}`);
}

Related Pages

Implements Principle

Principle:Openai_Openai_node_Audio_Transcription

Requires Environment

Environment:Openai_Openai_node_Node_20_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment