Implementation:Togethercomputer Together python Speech Create

Knowledge Sources	Together Python
Domains	Audio, Text_To_Speech
Last Updated	2026-02-15 16:00 GMT

Overview

Concrete tool for generating speech audio from text input provided by the Together Python SDK.

Description

The Speech class provides text-to-speech functionality via the Together API. It converts input text to audio using specified models and voices, supporting multiple output formats (WAV, MP3, RAW), languages, audio encodings, and configurable sample rates. Supports both streaming and non-streaming modes. The response includes a stream_to_file() helper for saving audio directly to disk.

Usage

Import this class when you need to convert text to spoken audio. Access via client.audio.speech.create().

Code Reference

Source Location

Repository: Together Python
File: src/together/resources/audio/speech.py
Lines: 1-159

Signature

class Speech:
    def create(
        self,
        *,
        model: str,
        input: str,
        voice: str | None = None,
        response_format: str = "wav",
        language: str = "en",
        response_encoding: str = "pcm_f32le",
        sample_rate: int | None = None,
        stream: bool = False,
    ) -> AudioSpeechStreamResponse: ...

Import

from together import Together

client = Together()
# Access via client.audio.speech

I/O Contract

Inputs

Name	Type	Required	Description
model	str	Yes	TTS model name (e.g., "cartesia/sonic")
input	str	Yes	Text to convert to speech
voice	str	No	Voice name to use for generation
response_format	str	No	Audio format: "wav", "mp3", or "raw" (default: "wav")
language	str	No	Language code (default: "en")
sample_rate	int	No	Output sample rate (auto-detected per model if not set)
stream	bool	No	Enable streaming mode (default: False)

Outputs

Name	Type	Description
returns	AudioSpeechStreamResponse	Audio data wrapper with stream_to_file() method

Usage Examples

from together import Together

client = Together()

# Generate speech and save to file
audio = client.audio.speech.create(
    model="cartesia/sonic",
    input="Welcome to the Together AI platform.",
    voice="laidback woman",
    response_format="wav",
    language="en",
)
audio.stream_to_file("welcome.wav")

# Streaming mode
audio_stream = client.audio.speech.create(
    model="cartesia/sonic",
    input="This is streamed audio.",
    stream=True,
)
audio_stream.stream_to_file("streamed.wav")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment