Implementation:Neuml Txtai TextToAudio
| Knowledge Sources | |
|---|---|
| Domains | Audio, Audio Generation, NLP |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for generating audio from text descriptions provided by txtai.
Description
TextToAudio is a pipeline that generates audio waveforms from text descriptions using Hugging Face "text-to-audio" models. Unlike TextToSpeech which produces speech, this pipeline generates general audio such as music or sound effects based on text prompts. It extends the HFPipeline base class and wraps the Hugging Face text-to-audio pipeline task. The pipeline supports optional resampling to a configurable target sample rate and handles both single string and batch list inputs. It uses the forward_params mechanism to control the maximum number of new tokens (audio length) generated.
Usage
Use TextToAudio when you need to generate non-speech audio from text descriptions, such as creating sound effects from descriptions ("a dog barking"), generating music from text prompts, or producing ambient audio. This differs from TextToSpeech which specifically synthesizes human speech from written text.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/pipeline/audio/texttoaudio.py
Signature
class TextToAudio(HFPipeline):
def __init__(self, path=None, quantize=False, gpu=True, model=None, rate=None, **kwargs)
def __call__(self, text, maxlength=512)
Import
from txtai.pipeline.audio.texttoaudio import TextToAudio
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | No | Model path or Hugging Face repo id for the text-to-audio model. |
| quantize | bool | No | Enable model quantization. Defaults to False. |
| gpu | bool | No | Use GPU acceleration if available. Defaults to True. |
| model | object | No | Optional pre-loaded model instance. |
| rate | int | No | Target sample rate for output audio. When None, uses model's native sample rate. |
| text | str or list | Yes | Text description or list of descriptions to generate audio from. |
| maxlength | int | No | Maximum audio length to generate (in tokens). Defaults to 512. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | tuple(numpy.ndarray, int) | A tuple of (audio waveform as NumPy array, sample rate) for a single text input. |
| results | list of tuple(numpy.ndarray, int) | A list of (audio, sample rate) tuples when input is a list. |
Usage Examples
from txtai.pipeline import TextToAudio
# Create a TextToAudio pipeline
audio_gen = TextToAudio()
# Generate audio from a text description
audio, rate = audio_gen("A cat purring softly")
# Generate audio with a custom max length
audio, rate = audio_gen("Thunder rolling in the distance", maxlength=1024)
# Batch generate audio from multiple descriptions
results = audio_gen(["Birds chirping in the morning", "Rain on a tin roof"])
# Use a specific model with target sample rate
audio_gen_custom = TextToAudio(path="facebook/audiogen-medium", rate=16000)
audio, rate = audio_gen_custom("Ocean waves crashing on a beach")