Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai TextToAudio

From Leeroopedia


Knowledge Sources
Domains Audio, Audio Generation, NLP
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for generating audio from text descriptions provided by txtai.

Description

TextToAudio is a pipeline that generates audio waveforms from text descriptions using Hugging Face "text-to-audio" models. Unlike TextToSpeech which produces speech, this pipeline generates general audio such as music or sound effects based on text prompts. It extends the HFPipeline base class and wraps the Hugging Face text-to-audio pipeline task. The pipeline supports optional resampling to a configurable target sample rate and handles both single string and batch list inputs. It uses the forward_params mechanism to control the maximum number of new tokens (audio length) generated.

Usage

Use TextToAudio when you need to generate non-speech audio from text descriptions, such as creating sound effects from descriptions ("a dog barking"), generating music from text prompts, or producing ambient audio. This differs from TextToSpeech which specifically synthesizes human speech from written text.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/audio/texttoaudio.py

Signature

class TextToAudio(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, rate=None, **kwargs)
    def __call__(self, text, maxlength=512)

Import

from txtai.pipeline.audio.texttoaudio import TextToAudio

I/O Contract

Inputs

Name Type Required Description
path str No Model path or Hugging Face repo id for the text-to-audio model.
quantize bool No Enable model quantization. Defaults to False.
gpu bool No Use GPU acceleration if available. Defaults to True.
model object No Optional pre-loaded model instance.
rate int No Target sample rate for output audio. When None, uses model's native sample rate.
text str or list Yes Text description or list of descriptions to generate audio from.
maxlength int No Maximum audio length to generate (in tokens). Defaults to 512.

Outputs

Name Type Description
result tuple(numpy.ndarray, int) A tuple of (audio waveform as NumPy array, sample rate) for a single text input.
results list of tuple(numpy.ndarray, int) A list of (audio, sample rate) tuples when input is a list.

Usage Examples

from txtai.pipeline import TextToAudio

# Create a TextToAudio pipeline
audio_gen = TextToAudio()

# Generate audio from a text description
audio, rate = audio_gen("A cat purring softly")

# Generate audio with a custom max length
audio, rate = audio_gen("Thunder rolling in the distance", maxlength=1024)

# Batch generate audio from multiple descriptions
results = audio_gen(["Birds chirping in the morning", "Rain on a tin roof"])

# Use a specific model with target sample rate
audio_gen_custom = TextToAudio(path="facebook/audiogen-medium", rate=16000)
audio, rate = audio_gen_custom("Ocean waves crashing on a beach")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment