Implementation:Neuml Txtai Audio Signal Processing
| Knowledge Sources | |
|---|---|
| Domains | Audio, Signal Processing, Utilities |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for audio signal processing utilities provided by txtai.
Description
Signal is a utility class that provides static methods for common audio signal processing operations. It includes methods for converting stereo audio to mono, resampling audio to a target sample rate using scipy, converting between 16-bit integer and 32-bit float representations, mixing two audio segments together with scaling, computing signal energy via FFT for frequency analysis, and trimming leading and trailing silence from audio based on energy thresholds. This class is used extensively by other audio pipelines in txtai as a shared signal processing foundation.
Usage
Use Signal when you need low-level audio signal processing operations such as resampling, format conversion, audio mixing, silence trimming, or frequency energy analysis. It is primarily used internally by other txtai audio pipelines (TextToSpeech, AudioMixer, AudioStream, Microphone, Transcription) but can also be called directly for custom audio processing workflows.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/pipeline/audio/signal.py
Signature
class Signal:
@staticmethod
def mono(audio)
@staticmethod
def resample(audio, rate, target)
@staticmethod
def float32(audio)
@staticmethod
def int16(audio)
@staticmethod
def mix(audio1, audio2, scale1=1, scale2=1)
@staticmethod
def energy(audio, rate)
@staticmethod
def trim(audio, rate, threshold=1, leading=True, trailing=True)
Import
from txtai.pipeline.audio.signal import Signal
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| audio | numpy.ndarray | Yes | Input audio data as a NumPy array (used by all methods). |
| rate | int | Yes* | Current sample rate of the audio. Required by resample, energy, and trim. |
| target | int | Yes* | Target sample rate for resampling. Required by resample. |
| scale1 | float | No | Scaling factor for first audio in mix. Defaults to 1. |
| scale2 | float | No | Scaling factor for second audio in mix. Defaults to 1. |
| threshold | float | No | Energy threshold for silence detection in trim. Defaults to 1. |
| leading | bool | No | Whether to trim leading silence. Defaults to True. |
| trailing | bool | No | Whether to trim trailing silence. Defaults to True. |
Outputs
| Name | Type | Description |
|---|---|---|
| mono | numpy.ndarray | Single-channel audio data. |
| resample | numpy.ndarray | Audio resampled to target sample rate. |
| float32 | numpy.ndarray | Audio converted from int16 to float32 format. |
| int16 | numpy.ndarray | Audio converted from float32 to int16 format. |
| mix | numpy.ndarray | Two audio segments mixed into one, with the shorter segment tiled to match the longer. |
| energy | dict | Dictionary mapping frequency (float) to energy value (float) for the input audio. |
| trim | numpy.ndarray | Audio with leading and/or trailing silence removed. |
Usage Examples
from txtai.pipeline.audio.signal import Signal
import numpy as np
# Convert stereo to mono
stereo_audio = np.random.randn(22050, 2)
mono_audio = Signal.mono(stereo_audio)
# Resample audio from 44100 Hz to 16000 Hz
audio = np.random.randn(44100).astype(np.float32)
resampled = Signal.resample(audio, 44100, 16000)
# Convert int16 audio to float32
int_audio = np.array([0, 16384, -16384], dtype=np.int16)
float_audio = Signal.float32(int_audio)
# Convert float32 audio back to int16
int_audio = Signal.int16(float_audio)
# Mix two audio segments with scaling
audio1 = np.random.randn(22050).astype(np.float32)
audio2 = np.random.randn(11025).astype(np.float32)
mixed = Signal.mix(audio1, audio2, scale1=1.0, scale2=0.5)
# Calculate signal energy per frequency
energy_map = Signal.energy(audio1, 22050)
# Trim silence from audio
trimmed = Signal.trim(audio1, 22050, threshold=1.0)