Implementation:Openai Openai python Transcriptions Create
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Audio, Speech_Recognition |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for transcribing audio files to text with language detection, timestamps, and streaming provided by the OpenAI Python SDK.
Description
The Transcriptions resource provides a create() method that accepts audio files and returns transcribed text. It supports multiple output formats (JSON, verbose JSON with timestamps, SRT, VTT, diarized JSON with speakers), streaming transcription, and configurable chunking strategies with VAD.
Usage
Call client.audio.transcriptions.create() with an audio file and model selection. Supported audio formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
Code Reference
Source Location
- Repository: openai-python
- File: src/openai/resources/audio/transcriptions.py
- Lines: L429-487 (sync impl), L872-930 (async impl)
Signature
class Transcriptions(SyncAPIResource):
def create(
self,
*,
file: FileTypes,
model: Union[str, AudioModel],
language: str | NotGiven = NOT_GIVEN,
prompt: str | NotGiven = NOT_GIVEN,
response_format: AudioResponseFormat | NotGiven = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
stream: Optional[Literal[False]] | Literal[True] = False,
chunking_strategy: ChunkingStrategy | NotGiven = NOT_GIVEN,
include: List[TranscriptionInclude] | NotGiven = NOT_GIVEN,
) -> Transcription | TranscriptionVerbose | TranscriptionDiarized | str | Stream[TranscriptionStreamEvent]:
"""
Transcribes audio into text.
Args:
file: Audio file (flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm).
model: Model ID (gpt-4o-transcribe, whisper-1, etc.).
language: ISO-639-1 language code.
prompt: Optional context/vocabulary hint.
response_format: json, text, srt, verbose_json, vtt, diarized_json.
temperature: Sampling temperature.
stream: Enable streaming transcription.
"""
Import
from openai import OpenAI
# Access via client.audio.transcriptions.create()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file | FileTypes | Yes | Audio file (path, bytes, or file-like) |
| model | Union[str, AudioModel] | Yes | Model (gpt-4o-transcribe, whisper-1) |
| language | str | No | ISO-639-1 language code hint |
| prompt | str | No | Context/vocabulary hint for accuracy |
| response_format | AudioResponseFormat | No | Output format (json, text, srt, verbose_json, vtt, diarized_json) |
| temperature | float | No | Sampling temperature |
| stream | bool | No | Enable streaming transcription |
Outputs
| Name | Type | Description |
|---|---|---|
| transcription (json) | Transcription | Object with .text field |
| transcription (verbose_json) | TranscriptionVerbose | Object with .text, .segments (timestamps), .words |
| transcription (diarized_json) | TranscriptionDiarized | Object with .text and speaker labels |
| transcription (text/srt/vtt) | str | Plain text, SRT subtitles, or VTT subtitles |
| stream | Stream[TranscriptionStreamEvent] | Streaming transcription events |
Usage Examples
Basic Transcription
from openai import OpenAI
client = OpenAI()
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=open("audio.mp3", "rb"),
)
print(transcription.text)
With Timestamps
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=open("audio.mp3", "rb"),
response_format="verbose_json",
)
for segment in transcription.segments:
print(f"[{segment.start:.1f}s - {segment.end:.1f}s] {segment.text}")
Streaming Transcription
stream = client.audio.transcriptions.create(
model="gpt-4o-transcribe",
file=open("audio.mp3", "rb"),
stream=True,
)
for event in stream:
if hasattr(event, "text"):
print(event.text, end="", flush=True)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment