Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python Transcriptions Create

From Leeroopedia
Knowledge Sources
Domains Audio, Speech_To_Text
Last Updated 2026-02-15 16:00 GMT

Overview

Concrete tool for transcribing audio files into text provided by the Together Python SDK.

Description

The Transcriptions class provides speech-to-text functionality. It accepts audio files (local paths, URLs, or file objects) in formats including flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Supports language specification, response format (JSON or verbose JSON with timestamps), temperature control, timestamp granularities (word or segment level), and speaker diarization.

Usage

Import this class when you need to convert audio to text. Access via client.audio.transcriptions.create().

Code Reference

Source Location

Signature

class Transcriptions:
    def create(
        self,
        *,
        file: Union[str, BinaryIO, Path],
        model: str = "openai/whisper-large-v3",
        language: Optional[str] = None,
        prompt: Optional[str] = None,
        response_format: Union[str, AudioTranscriptionResponseFormat] = "json",
        temperature: float = 0.0,
        timestamp_granularities: Optional[Union[str, AudioTimestampGranularities]] = None,
        diarize: bool = False,
    ) -> Union[AudioTranscriptionResponse, AudioTranscriptionVerboseResponse]: ...

Import

from together import Together

client = Together()
# Access via client.audio.transcriptions

I/O Contract

Inputs

Name Type Required Description
file Union[str, BinaryIO, Path] Yes Audio file path, URL, or file object
model str No Model ID (default: "openai/whisper-large-v3")
language str No ISO-639-1 language code for better accuracy
response_format str No "json" or "verbose_json" (default: "json")
temperature float No Sampling temperature 0-1 (default: 0.0)
timestamp_granularities str No "word" or "segment" (requires verbose_json)
diarize bool No Enable speaker diarization (default: False)

Outputs

Name Type Description
returns (json) AudioTranscriptionResponse Simple text transcription
returns (verbose_json) AudioTranscriptionVerboseResponse Text with segments, words, timestamps, speaker info

Usage Examples

from together import Together

client = Together()

# Simple transcription
result = client.audio.transcriptions.create(
    file="interview.mp3",
    model="openai/whisper-large-v3",
)
print(result.text)

# Verbose with timestamps and diarization
result = client.audio.transcriptions.create(
    file="meeting.wav",
    model="openai/whisper-large-v3",
    response_format="verbose_json",
    timestamp_granularities="word",
    diarize=True,
)
for segment in result.speaker_segments:
    print(f"Speaker {segment.speaker_id}: {segment.text}")

# From URL
result = client.audio.transcriptions.create(
    file="https://example.com/audio.mp3",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment