Implementation:Ggml org Llama cpp TTS

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Text_To_Speech, Audio
Last Updated	2026-02-15 00:00 GMT

Overview

Native C++ text-to-speech tool that generates audio WAV files from text using OuteTTS models with WavTokenizer audio decoding.

Description

Loads an OuteTTS LLM model (v0.2 or v0.3) and a WavTokenizer decoder model. Processes input text by constructing OuteTTS-format prompts with speaker voice data, then runs the LLM to generate audio code tokens. Feeds these tokens through the WavTokenizer decoder to produce audio embeddings, which are converted to audio waveforms via inverse STFT (magnitude/phase reconstruction, Hann windowing, overlap-add synthesis). Includes terminal visualization of the audio spectrogram using xterm256 colors and writes output as a standard WAV file.

Usage

Use this tool to generate speech audio from text input entirely within llama.cpp, without requiring Python dependencies. Supports automatic model downloading with `--tts-oute-default` for quick setup.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: tools/tts/tts.cpp
Lines: 1-1093

Signature

// Main entry point
int main(int argc, char ** argv);

// WAV file header structure
struct wav_header {
    char riff[4] = {'R', 'I', 'F', 'F'};
    uint32_t chunk_size;
    char wave[4] = {'W', 'A', 'V', 'E'};
    char fmt[4]  = {'f', 'm', 't', ' '};
    // ... PCM format fields
};

// OuteTTS version enum
enum outetts_version { OUTETTS_V0_2, OUTETTS_V0_3 };

// Terminal color utilities
static int rgb2xterm256(int r, int g, int b);
static std::string set_xterm256_foreground(int r, int g, int b);

Import

#include "arg.h"
#include "common.h"
#include "sampling.h"
#include "log.h"
#include "llama.h"
#include <nlohmann/json.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
-m, --model	string	Yes	Path to the OuteTTS LLM model file
--tts-oute-default	flag	No	Automatically download and use default OuteTTS models
-p, --prompt	string	Yes	Text to convert to speech
--tts-voice	string	No	Path to speaker voice JSON data for voice cloning
-o, --output	string	No	Output WAV file path (default: output.wav)
--tts-wavtokenizer	string	No	Path to WavTokenizer decoder model

Outputs

Name	Type	Description
WAV file	file	Standard PCM WAV audio file containing the generated speech
spectrogram	terminal	Visual spectrogram display using xterm256 colors (to stderr)
return code	int	0 on success, non-zero on failure

Usage Examples

# Generate speech with default models (auto-download)
./tts --tts-oute-default -p "Hello, this is a test of text to speech."

# Generate speech with specific models
./tts -m outetts-v0.3.gguf --tts-wavtokenizer wavtokenizer.gguf \
    -p "The quick brown fox jumps over the lazy dog." -o speech.wav

# Use a custom speaker voice
./tts -m outetts-v0.3.gguf --tts-wavtokenizer wavtokenizer.gguf \
    --tts-voice speaker.json -p "Custom voice synthesis."

Related Pages

Principle:Ggml_org_Llama_cpp_Text_To_Speech

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment