Principle:Openai Openai node Audio Transcription
| Knowledge Sources | |
|---|---|
| Domains | Audio, Speech_Recognition |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A principle for converting spoken audio into text using automatic speech recognition models with optional language hints and timestamp granularity.
Description
Audio Transcription (speech-to-text) converts audio files into text. The system accepts various audio formats, processes them through a speech recognition model (Whisper or GPT-4o-transcribe), and returns the transcribed text. It supports multiple output formats (plain text, JSON with metadata, SRT subtitles, VTT subtitles) and optional word-level or segment-level timestamps.
The SDK also supports streaming transcription for real-time processing of audio input.
Usage
Use this principle when your application needs to convert audio recordings to text. Common scenarios include meeting transcription, voice command processing, subtitle generation, and accessibility features.
Theoretical Basis
Audio transcription follows an Audio → Model → Text pipeline:
function transcribeAudio(audioFile, model, options):
response = await api.post('/audio/transcriptions', multipart({
file: audioFile,
model: model, // 'whisper-1' or 'gpt-4o-transcribe'
language: options.language, // ISO-639-1 code (optional)
prompt: options.prompt, // Context hint (optional)
response_format: options.format, // 'json' | 'text' | 'srt' | 'vtt' | 'verbose_json'
temperature: options.temperature,
timestamp_granularities: options.timestamps, // ['word'] | ['segment'] | ['word', 'segment']
}))
return response // Transcription text or structured result