Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Transcription

From Leeroopedia


Knowledge Sources
Domains Audio, Speech Recognition, NLP
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for transcribing audio to text provided by txtai.

Description

Transcription is a pipeline that transcribes audio files or raw audio data to text using Hugging Face automatic speech recognition (ASR) models. It extends the HFPipeline base class, wrapping the automatic-speech-recognition task. The pipeline supports multiple audio input formats: file paths, file-like objects, NumPy arrays with optional sample rate, and (audio, rate) tuples. Audio is automatically converted to mono and resampled to the model's expected sample rate. It supports chunked processing to handle long audio files by splitting them into configurable segment durations. Two processing modes are available: a standard mode that returns transcribed text per input, and a batch mode that returns per-chunk results with the original raw audio data and sample rate for each chunk. Text normalization converts all-uppercase output to capitalized case.

Usage

Use Transcription when you need to convert audio recordings into text. This is useful for speech-to-text applications, meeting transcription, voice command processing, podcast indexing, or any workflow that requires extracting text from audio content. It pairs naturally with the Microphone pipeline for real-time voice input.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/pipeline/audio/transcription.py

Signature

class Transcription(HFPipeline):
    def __init__(self, path=None, quantize=False, gpu=True, model=None, **kwargs)
    def __call__(self, audio, rate=None, chunk=10, join=True, **kwargs)

Import

from txtai.pipeline.audio.transcription import Transcription

I/O Contract

Inputs

Name Type Required Description
path str No Model path or Hugging Face repo id for the ASR model.
quantize bool No Enable model quantization. Defaults to False.
gpu bool No Use GPU acceleration if available. Defaults to True.
model object No Optional pre-loaded model instance.
audio str, tuple, numpy.ndarray, file-like, or list Yes Audio input: a file path, (audio_data, rate) tuple, NumPy array, file-like object, or a list of any of these.
rate int No Sample rate of the input audio. Only required when audio is a raw NumPy array without an accompanying sample rate.
chunk int No Duration in seconds to split audio into for processing. Defaults to 10.
join bool No If True (default), combines all chunk transcriptions into a single text string. If False, returns per-chunk results with raw audio data.
kwargs dict No Additional keyword arguments passed to the model's generate method.

Outputs

Name Type Description
result str Transcribed text when a single audio input is provided and join=True.
results list of str List of transcribed text strings when a list of audio inputs is provided and join=True.
results list of list of dict When join=False, a list of lists of dicts. Each dict contains "text" (str), "raw" (numpy.ndarray), and "rate" (int) keys.

Usage Examples

from txtai.pipeline import Transcription

# Create a transcription pipeline
transcribe = Transcription()

# Transcribe an audio file
text = transcribe("audio.wav")

# Transcribe a NumPy array with sample rate
import numpy as np
audio_data = np.random.randn(16000).astype(np.float32)
text = transcribe(audio_data, rate=16000)

# Transcribe with a tuple of (audio, rate)
text = transcribe((audio_data, 16000))

# Batch transcribe multiple audio files
texts = transcribe(["audio1.wav", "audio2.wav"])

# Get per-chunk results with raw audio data
chunks = transcribe("long_audio.wav", chunk=5, join=False)
for chunk_list in chunks:
    for chunk in chunk_list:
        print(chunk["text"], chunk["rate"])

# Use a specific model
transcribe_whisper = Transcription(path="openai/whisper-base")
text = transcribe_whisper("audio.wav")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment