Implementation:Neuml Txtai TextToAudio

Knowledge Sources	Neuml_Txtai
Domains	Audio, Audio Generation, NLP
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for generating audio from text descriptions provided by txtai.

Description

TextToAudio is a pipeline that generates audio waveforms from text descriptions using Hugging Face "text-to-audio" models. Unlike TextToSpeech which produces speech, this pipeline generates general audio such as music or sound effects based on text prompts. It extends the HFPipeline base class and wraps the Hugging Face text-to-audio pipeline task. The pipeline supports optional resampling to a configurable target sample rate and handles both single string and batch list inputs. It uses the forward_params mechanism to control the maximum number of new tokens (audio length) generated.

Usage

Use TextToAudio when you need to generate non-speech audio from text descriptions, such as creating sound effects from descriptions ("a dog barking"), generating music from text prompts, or producing ambient audio. This differs from TextToSpeech which specifically synthesizes human speech from written text.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/pipeline/audio/texttoaudio.py

Signature

class TextToAudio(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, rate=None, **kwargs)
    def __call__(self, text, maxlength=512)

Import

from txtai.pipeline.audio.texttoaudio import TextToAudio

I/O Contract

Inputs

Name	Type	Required	Description
path	str	No	Model path or Hugging Face repo id for the text-to-audio model.
quantize	bool	No	Enable model quantization. Defaults to False.
gpu	bool	No	Use GPU acceleration if available. Defaults to True.
model	object	No	Optional pre-loaded model instance.
rate	int	No	Target sample rate for output audio. When None, uses model's native sample rate.
text	str or list	Yes	Text description or list of descriptions to generate audio from.
maxlength	int	No	Maximum audio length to generate (in tokens). Defaults to 512.

Outputs

Name	Type	Description
result	tuple(numpy.ndarray, int)	A tuple of (audio waveform as NumPy array, sample rate) for a single text input.
results	list of tuple(numpy.ndarray, int)	A list of (audio, sample rate) tuples when input is a list.

Usage Examples

from txtai.pipeline import TextToAudio

# Create a TextToAudio pipeline
audio_gen = TextToAudio()

# Generate audio from a text description
audio, rate = audio_gen("A cat purring softly")

# Generate audio with a custom max length
audio, rate = audio_gen("Thunder rolling in the distance", maxlength=1024)

# Batch generate audio from multiple descriptions
results = audio_gen(["Birds chirping in the morning", "Rain on a tin roof"])

# Use a specific model with target sample rate
audio_gen_custom = TextToAudio(path="facebook/audiogen-medium", rate=16000)
audio, rate = audio_gen_custom("Ocean waves crashing on a beach")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment