Implementation:Neuml Txtai Transcription

Knowledge Sources	Neuml_Txtai
Domains	Audio, Speech Recognition, NLP
Last Updated	2026-02-10 01:00 GMT

Overview

Concrete tool for transcribing audio to text provided by txtai.

Description

Transcription is a pipeline that transcribes audio files or raw audio data to text using Hugging Face automatic speech recognition (ASR) models. It extends the HFPipeline base class, wrapping the automatic-speech-recognition task. The pipeline supports multiple audio input formats: file paths, file-like objects, NumPy arrays with optional sample rate, and (audio, rate) tuples. Audio is automatically converted to mono and resampled to the model's expected sample rate. It supports chunked processing to handle long audio files by splitting them into configurable segment durations. Two processing modes are available: a standard mode that returns transcribed text per input, and a batch mode that returns per-chunk results with the original raw audio data and sample rate for each chunk. Text normalization converts all-uppercase output to capitalized case.

Usage

Use Transcription when you need to convert audio recordings into text. This is useful for speech-to-text applications, meeting transcription, voice command processing, podcast indexing, or any workflow that requires extracting text from audio content. It pairs naturally with the Microphone pipeline for real-time voice input.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/pipeline/audio/transcription.py

Signature

class Transcription(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, audio, rate=None, chunk=10, join=True, **kwargs)

Import

from txtai.pipeline.audio.transcription import Transcription

I/O Contract

Inputs

Name	Type	Required	Description
path	str	No	Model path or Hugging Face repo id for the ASR model.
quantize	bool	No	Enable model quantization. Defaults to False.
gpu	bool	No	Use GPU acceleration if available. Defaults to True.
model	object	No	Optional pre-loaded model instance.
audio	str, tuple, numpy.ndarray, file-like, or list	Yes	Audio input: a file path, (audio_data, rate) tuple, NumPy array, file-like object, or a list of any of these.
rate	int	No	Sample rate of the input audio. Only required when audio is a raw NumPy array without an accompanying sample rate.
chunk	int	No	Duration in seconds to split audio into for processing. Defaults to 10.
join	bool	No	If True (default), combines all chunk transcriptions into a single text string. If False, returns per-chunk results with raw audio data.
kwargs	dict	No	Additional keyword arguments passed to the model's generate method.

Outputs

Name	Type	Description
result	str	Transcribed text when a single audio input is provided and join=True.
results	list of str	List of transcribed text strings when a list of audio inputs is provided and join=True.
results	list of list of dict	When join=False, a list of lists of dicts. Each dict contains "text" (str), "raw" (numpy.ndarray), and "rate" (int) keys.

Usage Examples

from txtai.pipeline import Transcription

# Create a transcription pipeline
transcribe = Transcription()

# Transcribe an audio file
text = transcribe("audio.wav")

# Transcribe a NumPy array with sample rate
import numpy as np
audio_data = np.random.randn(16000).astype(np.float32)
text = transcribe(audio_data, rate=16000)

# Transcribe with a tuple of (audio, rate)
text = transcribe((audio_data, 16000))

# Batch transcribe multiple audio files
texts = transcribe(["audio1.wav", "audio2.wav"])

# Get per-chunk results with raw audio data
chunks = transcribe("long_audio.wav", chunk=5, join=False)
for chunk_list in chunks:
    for chunk in chunk_list:
        print(chunk["text"], chunk["rate"])

# Use a specific model
transcribe_whisper = Transcription(path="openai/whisper-base")
text = transcribe_whisper("audio.wav")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment