Implementation:Togethercomputer Together python Transcriptions Create

Knowledge Sources	Together Python
Domains	Audio, Speech_To_Text
Last Updated	2026-02-15 16:00 GMT

Overview

Concrete tool for transcribing audio files into text provided by the Together Python SDK.

Description

The Transcriptions class provides speech-to-text functionality. It accepts audio files (local paths, URLs, or file objects) in formats including flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Supports language specification, response format (JSON or verbose JSON with timestamps), temperature control, timestamp granularities (word or segment level), and speaker diarization.

Usage

Import this class when you need to convert audio to text. Access via client.audio.transcriptions.create().

Code Reference

Source Location

Repository: Together Python
File: src/together/resources/audio/transcriptions.py
Lines: 1-296

Signature

class Transcriptions:
    def create(
        self,
        *,
        file: Union[str, BinaryIO, Path],
        model: str = "openai/whisper-large-v3",
        language: Optional[str] = None,
        prompt: Optional[str] = None,
        response_format: Union[str, AudioTranscriptionResponseFormat] = "json",
        temperature: float = 0.0,
        timestamp_granularities: Optional[Union[str, AudioTimestampGranularities]] = None,
        diarize: bool = False,
    ) -> Union[AudioTranscriptionResponse, AudioTranscriptionVerboseResponse]: ...

Import

from together import Together

client = Together()
# Access via client.audio.transcriptions

I/O Contract

Inputs

Name	Type	Required	Description
file	Union[str, BinaryIO, Path]	Yes	Audio file path, URL, or file object
model	str	No	Model ID (default: "openai/whisper-large-v3")
language	str	No	ISO-639-1 language code for better accuracy
response_format	str	No	"json" or "verbose_json" (default: "json")
temperature	float	No	Sampling temperature 0-1 (default: 0.0)
timestamp_granularities	str	No	"word" or "segment" (requires verbose_json)
diarize	bool	No	Enable speaker diarization (default: False)

Outputs

Name	Type	Description
returns (json)	AudioTranscriptionResponse	Simple text transcription
returns (verbose_json)	AudioTranscriptionVerboseResponse	Text with segments, words, timestamps, speaker info

Usage Examples

from together import Together

client = Together()

# Simple transcription
result = client.audio.transcriptions.create(
    file="interview.mp3",
    model="openai/whisper-large-v3",
)
print(result.text)

# Verbose with timestamps and diarization
result = client.audio.transcriptions.create(
    file="meeting.wav",
    model="openai/whisper-large-v3",
    response_format="verbose_json",
    timestamp_granularities="word",
    diarize=True,
)
for segment in result.speaker_segments:
    print(f"Speaker {segment.speaker_id}: {segment.text}")

# From URL
result = client.audio.transcriptions.create(
    file="https://example.com/audio.mp3",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment