Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp TTS

From Leeroopedia
Knowledge Sources
Domains Text_To_Speech, Audio
Last Updated 2026-02-15 00:00 GMT

Overview

Native C++ text-to-speech tool that generates audio WAV files from text using OuteTTS models with WavTokenizer audio decoding.

Description

Loads an OuteTTS LLM model (v0.2 or v0.3) and a WavTokenizer decoder model. Processes input text by constructing OuteTTS-format prompts with speaker voice data, then runs the LLM to generate audio code tokens. Feeds these tokens through the WavTokenizer decoder to produce audio embeddings, which are converted to audio waveforms via inverse STFT (magnitude/phase reconstruction, Hann windowing, overlap-add synthesis). Includes terminal visualization of the audio spectrogram using xterm256 colors and writes output as a standard WAV file.

Usage

Use this tool to generate speech audio from text input entirely within llama.cpp, without requiring Python dependencies. Supports automatic model downloading with `--tts-oute-default` for quick setup.

Code Reference

Source Location

Signature

// Main entry point
int main(int argc, char ** argv);

// WAV file header structure
struct wav_header {
    char riff[4] = {'R', 'I', 'F', 'F'};
    uint32_t chunk_size;
    char wave[4] = {'W', 'A', 'V', 'E'};
    char fmt[4]  = {'f', 'm', 't', ' '};
    // ... PCM format fields
};

// OuteTTS version enum
enum outetts_version { OUTETTS_V0_2, OUTETTS_V0_3 };

// Terminal color utilities
static int rgb2xterm256(int r, int g, int b);
static std::string set_xterm256_foreground(int r, int g, int b);

Import

#include "arg.h"
#include "common.h"
#include "sampling.h"
#include "log.h"
#include "llama.h"
#include <nlohmann/json.hpp>

I/O Contract

Inputs

Name Type Required Description
-m, --model string Yes Path to the OuteTTS LLM model file
--tts-oute-default flag No Automatically download and use default OuteTTS models
-p, --prompt string Yes Text to convert to speech
--tts-voice string No Path to speaker voice JSON data for voice cloning
-o, --output string No Output WAV file path (default: output.wav)
--tts-wavtokenizer string No Path to WavTokenizer decoder model

Outputs

Name Type Description
WAV file file Standard PCM WAV audio file containing the generated speech
spectrogram terminal Visual spectrogram display using xterm256 colors (to stderr)
return code int 0 on success, non-zero on failure

Usage Examples

# Generate speech with default models (auto-download)
./tts --tts-oute-default -p "Hello, this is a test of text to speech."

# Generate speech with specific models
./tts -m outetts-v0.3.gguf --tts-wavtokenizer wavtokenizer.gguf \
    -p "The quick brown fox jumps over the lazy dog." -o speech.wav

# Use a custom speaker voice
./tts -m outetts-v0.3.gguf --tts-wavtokenizer wavtokenizer.gguf \
    --tts-voice speaker.json -p "Custom voice synthesis."

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment