Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai Openai node Transcriptions Create

From Leeroopedia
Knowledge Sources
Domains Audio, Speech_Recognition
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for transcribing audio files to text provided by the openai-node SDK.

Description

The Transcriptions.create() method has 6 overloads supporting different combinations of response formats and streaming modes. It uploads an audio file via multipart form to /audio/transcriptions and returns the transcription in the requested format. Streaming mode returns a Stream<TranscriptionStreamEvent> for real-time processing.

Usage

Use this method to transcribe audio files. Choose the response format based on your needs: json for structured data, text for plain text, srt/vtt for subtitles, or verbose_json for detailed timing information.

Code Reference

Source Location

  • Repository: openai-node
  • File: src/resources/audio/transcriptions.ts
  • Lines: L12-63 (class with overloads), L634-742 (TranscriptionCreateParamsBase)

Signature

class Transcriptions extends APIResource {
  // Non-streaming, JSON format
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format?: 'json' },
    options?: RequestOptions,
  ): APIPromise<Transcription>;

  // Non-streaming, verbose JSON
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format: 'verbose_json' },
    options?: RequestOptions,
  ): APIPromise<TranscriptionVerbose>;

  // Non-streaming, text/srt/vtt
  create(
    body: TranscriptionCreateParamsNonStreaming & { response_format: 'text' | 'srt' | 'vtt' },
    options?: RequestOptions,
  ): APIPromise<string>;

  // Streaming
  create(
    body: TranscriptionCreateParamsStreaming,
    options?: RequestOptions,
  ): APIPromise<Stream<TranscriptionStreamEvent>>;
}

interface TranscriptionCreateParamsBase {
  file: Uploadable;
  model: string | AudioModel;
  language?: string;
  prompt?: string;
  response_format?: 'json' | 'text' | 'srt' | 'vtt' | 'verbose_json';
  stream?: boolean;
  temperature?: number;
  timestamp_granularities?: Array<'word' | 'segment'>;
}

Import

import OpenAI from 'openai';
// Access via: client.audio.transcriptions.create(...)

I/O Contract

Inputs

Name Type Required Description
file Uploadable Yes Audio file to transcribe
model AudioModel Yes Model ('whisper-1', 'gpt-4o-transcribe', etc.)
language string No ISO-639-1 language code
prompt string No Context hint for recognition
response_format string No (default 'json') Output format
stream boolean No Enable streaming transcription
temperature number No Sampling temperature
timestamp_granularities Array No Timestamp detail level

Outputs

Name Type Description
(json) Transcription { text: string }
(verbose_json) TranscriptionVerbose { text, language, duration, words?, segments? }
(text/srt/vtt) string Plain text or subtitle format string
(streaming) Stream<TranscriptionStreamEvent> Real-time transcription events

Usage Examples

Basic Transcription

import OpenAI, { toFile } from 'openai';
import fs from 'fs';

const client = new OpenAI();

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
});

console.log(transcription.text);

Verbose with Timestamps

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
  response_format: 'verbose_json',
  timestamp_granularities: ['word', 'segment'],
});

console.log('Duration:', transcription.duration);
for (const word of transcription.words || []) {
  console.log(`${word.start}-${word.end}: ${word.word}`);
}

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment