Implementation:Openai Openai python Transcriptions Create

Knowledge Sources	openai-python OpenAI Transcription API
Domains	Audio, Speech_Recognition
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for transcribing audio files to text with language detection, timestamps, and streaming provided by the OpenAI Python SDK.

Description

The Transcriptions resource provides a create() method that accepts audio files and returns transcribed text. It supports multiple output formats (JSON, verbose JSON with timestamps, SRT, VTT, diarized JSON with speakers), streaming transcription, and configurable chunking strategies with VAD.

Usage

Call client.audio.transcriptions.create() with an audio file and model selection. Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

Code Reference

Source Location

Repository: openai-python
File: src/openai/resources/audio/transcriptions.py
Lines: L429-487 (sync impl), L872-930 (async impl)

Signature

class Transcriptions(SyncAPIResource):
    def create(
        self,
        *,
        file: FileTypes,
        model: Union[str, AudioModel],
        language: str | NotGiven = NOT_GIVEN,
        prompt: str | NotGiven = NOT_GIVEN,
        response_format: AudioResponseFormat | NotGiven = NOT_GIVEN,
        temperature: float | NotGiven = NOT_GIVEN,
        stream: Optional[Literal[False]] | Literal[True] = False,
        chunking_strategy: ChunkingStrategy | NotGiven = NOT_GIVEN,
        include: List[TranscriptionInclude] | NotGiven = NOT_GIVEN,
    ) -> Transcription | TranscriptionVerbose | TranscriptionDiarized | str | Stream[TranscriptionStreamEvent]:
        """
        Transcribes audio into text.

        Args:
            file: Audio file (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm).
            model: Model ID (gpt-4o-transcribe, whisper-1, etc.).
            language: ISO-639-1 language code.
            prompt: Optional context/vocabulary hint.
            response_format: json, text, srt, verbose_json, vtt, diarized_json.
            temperature: Sampling temperature.
            stream: Enable streaming transcription.
        """

Import

from openai import OpenAI
# Access via client.audio.transcriptions.create()

I/O Contract

Inputs

Name	Type	Required	Description
file	FileTypes	Yes	Audio file (path, bytes, or file-like)
model	Union[str, AudioModel]	Yes	Model (gpt-4o-transcribe, whisper-1)
language	str	No	ISO-639-1 language code hint
prompt	str	No	Context/vocabulary hint for accuracy
response_format	AudioResponseFormat	No	Output format (json, text, srt, verbose_json, vtt, diarized_json)
temperature	float	No	Sampling temperature
stream	bool	No	Enable streaming transcription

Outputs

Name	Type	Description
transcription (json)	Transcription	Object with .text field
transcription (verbose_json)	TranscriptionVerbose	Object with .text, .segments (timestamps), .words
transcription (diarized_json)	TranscriptionDiarized	Object with .text and speaker labels
transcription (text/srt/vtt)	str	Plain text, SRT subtitles, or VTT subtitles
stream	Stream[TranscriptionStreamEvent]	Streaming transcription events

Usage Examples

Basic Transcription

from openai import OpenAI

client = OpenAI()
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("audio.mp3", "rb"),
)
print(transcription.text)

With Timestamps

transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=open("audio.mp3", "rb"),
    response_format="verbose_json",
)
for segment in transcription.segments:
    print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")

Streaming Transcription

stream = client.audio.transcriptions.create(
    model="gpt-4o-transcribe",
    file=open("audio.mp3", "rb"),
    stream=True,
)
for event in stream:
    if hasattr(event, "text"):
        print(event.text, end="", flush=True)

Related Pages

Implements Principle

Principle:Openai_Openai_python_Speech_Transcription

Requires Environment

Environment:Openai_Openai_python_Python_3_9_Plus

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment