Principle:Openai Openai node Audio Transcription

Knowledge Sources	OpenAI Speech-to-Text Guide openai-node
Domains	Audio, Speech_Recognition
Last Updated	2026-02-15 00:00 GMT

Overview

A principle for converting spoken audio into text using automatic speech recognition models with optional language hints and timestamp granularity.

Description

Audio Transcription (speech-to-text) converts audio files into text. The system accepts various audio formats, processes them through a speech recognition model (Whisper or GPT-4o-transcribe), and returns the transcribed text. It supports multiple output formats (plain text, JSON with metadata, SRT subtitles, VTT subtitles) and optional word-level or segment-level timestamps.

The SDK also supports streaming transcription for real-time processing of audio input.

Usage

Use this principle when your application needs to convert audio recordings to text. Common scenarios include meeting transcription, voice command processing, subtitle generation, and accessibility features.

Theoretical Basis

Audio transcription follows an Audio → Model → Text pipeline:

function transcribeAudio(audioFile, model, options):
    response = await api.post('/audio/transcriptions', multipart({
        file: audioFile,
        model: model,              // 'whisper-1' or 'gpt-4o-transcribe'
        language: options.language, // ISO-639-1 code (optional)
        prompt: options.prompt,    // Context hint (optional)
        response_format: options.format,  // 'json' | 'text' | 'srt' | 'vtt' | 'verbose_json'
        temperature: options.temperature,
        timestamp_granularities: options.timestamps,  // ['word'] | ['segment'] | ['word', 'segment']
    }))

    return response  // Transcription text or structured result

Related Pages

Implemented By

Implementation:Openai_Openai_node_Transcriptions_Create

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment