Implementation:Togethercomputer Together python Transcriptions Create
| Knowledge Sources | |
|---|---|
| Domains | Audio, Speech_To_Text |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
Concrete tool for transcribing audio files into text provided by the Together Python SDK.
Description
The Transcriptions class provides speech-to-text functionality. It accepts audio files (local paths, URLs, or file objects) in formats including flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm. Supports language specification, response format (JSON or verbose JSON with timestamps), temperature control, timestamp granularities (word or segment level), and speaker diarization.
Usage
Import this class when you need to convert audio to text. Access via client.audio.transcriptions.create().
Code Reference
Source Location
- Repository: Together Python
- File: src/together/resources/audio/transcriptions.py
- Lines: 1-296
Signature
class Transcriptions:
def create(
self,
*,
file: Union[str, BinaryIO, Path],
model: str = "openai/whisper-large-v3",
language: Optional[str] = None,
prompt: Optional[str] = None,
response_format: Union[str, AudioTranscriptionResponseFormat] = "json",
temperature: float = 0.0,
timestamp_granularities: Optional[Union[str, AudioTimestampGranularities]] = None,
diarize: bool = False,
) -> Union[AudioTranscriptionResponse, AudioTranscriptionVerboseResponse]: ...
Import
from together import Together
client = Together()
# Access via client.audio.transcriptions
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file | Union[str, BinaryIO, Path] | Yes | Audio file path, URL, or file object |
| model | str | No | Model ID (default: "openai/whisper-large-v3") |
| language | str | No | ISO-639-1 language code for better accuracy |
| response_format | str | No | "json" or "verbose_json" (default: "json") |
| temperature | float | No | Sampling temperature 0-1 (default: 0.0) |
| timestamp_granularities | str | No | "word" or "segment" (requires verbose_json) |
| diarize | bool | No | Enable speaker diarization (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| returns (json) | AudioTranscriptionResponse | Simple text transcription |
| returns (verbose_json) | AudioTranscriptionVerboseResponse | Text with segments, words, timestamps, speaker info |
Usage Examples
from together import Together
client = Together()
# Simple transcription
result = client.audio.transcriptions.create(
file="interview.mp3",
model="openai/whisper-large-v3",
)
print(result.text)
# Verbose with timestamps and diarization
result = client.audio.transcriptions.create(
file="meeting.wav",
model="openai/whisper-large-v3",
response_format="verbose_json",
timestamp_granularities="word",
diarize=True,
)
for segment in result.speaker_segments:
print(f"Speaker {segment.speaker_id}: {segment.text}")
# From URL
result = client.audio.transcriptions.create(
file="https://example.com/audio.mp3",
)