Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Openai Openai python Transcriptions Create

From Leeroopedia
Knowledge Sources
Domains Audio, Speech_Recognition
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for transcribing audio files to text with language detection, timestamps, and streaming provided by the OpenAI Python SDK.

Description

The Transcriptions resource provides a create() method that accepts audio files and returns transcribed text. It supports multiple output formats (JSON, verbose JSON with timestamps, SRT, VTT, diarized JSON with speakers), streaming transcription, and configurable chunking strategies with VAD.

Usage

Call client.audio.transcriptions.create() with an audio file and model selection. Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

Code Reference

Source Location

  • Repository: openai-python
  • File: src/openai/resources/audio/transcriptions.py
  • Lines: L429-487 (sync impl), L872-930 (async impl)

Signature

class Transcriptions(SyncAPIResource):
    def create(
        self,
        *,
        file: FileTypes,
        model: Union[str, AudioModel],
        language: str | NotGiven = NOT_GIVEN,
        prompt: str | NotGiven = NOT_GIVEN,
        response_format: AudioResponseFormat | NotGiven = NOT_GIVEN,
        temperature: float | NotGiven = NOT_GIVEN,
        stream: Optional[Literal[False]] | Literal[True] = False,
        chunking_strategy: ChunkingStrategy | NotGiven = NOT_GIVEN,
        include: List[TranscriptionInclude] | NotGiven = NOT_GIVEN,
    ) -> Transcription | TranscriptionVerbose | TranscriptionDiarized | str | Stream[TranscriptionStreamEvent]:
        """
        Transcribes audio into text.

        Args:
            file: Audio file (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm).
            model: Model ID (gpt-4o-transcribe, whisper-1, etc.).
            language: ISO-639-1 language code.
            prompt: Optional context/vocabulary hint.
            response_format: json, text, srt, verbose_json, vtt, diarized_json.
            temperature: Sampling temperature.
            stream: Enable streaming transcription.
        """

Import

from openai import OpenAI
# Access via client.audio.transcriptions.create()

I/O Contract

Inputs

Name Type Required Description
file FileTypes Yes Audio file (path, bytes, or file-like)
model Union[str, AudioModel] Yes Model (gpt-4o-transcribe, whisper-1)
language str No ISO-639-1 language code hint
prompt str No Context/vocabulary hint for accuracy
response_format AudioResponseFormat No Output format (json, text, srt, verbose_json, vtt, diarized_json)
temperature float No Sampling temperature
stream bool No Enable streaming transcription

Outputs

Name Type Description
transcription (json) Transcription Object with .text field
transcription (verbose_json) TranscriptionVerbose Object with .text, .segments (timestamps), .words
transcription (diarized_json) TranscriptionDiarized Object with .text and speaker labels
transcription (text/srt/vtt) str Plain text, SRT subtitles, or VTT subtitles
stream Stream[TranscriptionStreamEvent] Streaming transcription events

Usage Examples

Basic Transcription

from openai import OpenAI

client = OpenAI()
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("audio.mp3", "rb"),
)
print(transcription.text)

With Timestamps

transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("audio.mp3", "rb"),
    response_format="verbose_json",
)
for segment in transcription.segments:
    print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")

Streaming Transcription

stream = client.audio.transcriptions.create(
    model="gpt-4o-transcribe",
    file=open("audio.mp3", "rb"),
    stream=True,
)
for event in stream:
    if hasattr(event, "text"):
        print(event.text, end="", flush=True)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment