Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Open ASR Utils

From Leeroopedia
Revision as of 12:31, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/EvolvingLMMs_Lab_Lmms_eval_Open_ASR_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Task utility functions for the Open ASR (Automatic Speech Recognition) benchmark, which evaluates speech recognition models using Word Error Rate (WER) metrics.

Location

/tmp/kapso_repo_sslb_59s/lmms_eval/tasks/open_asr/utils.py

Overview

Provides audio processing, result handling, and WER computation for ASR tasks. Supports multiple languages (English, Chinese, Yue) with language-specific text normalization and tokenization.

Core Functions

Document Processing

openasr_doc_to_audio(doc)
Extracts audio file path from document
Parameters: doc - Document dictionary
Process: Tries keys in order: audio, file, path, audio_path
Returns: List containing audio file path
Raises: KeyError if no audio field found
openasr_doc_to_text(doc, lmms_eval_specific_kwargs)
Constructs fixed ASR prompt
Parameters: doc, lmms_eval_specific_kwargs (with prompts)
Returns: "{pre_prompt}Please recognize the speech and only output the recognized content:{post_prompt}"
openasr_doc_to_target(doc)
Extracts ground truth transcription
Process: Tries keys in order: text, transcript, gt
Returns: Ground truth string
Raises: KeyError if no target field found

Result Processing

openasr_process_result(doc, result)
Packages prediction with ground truth for WER computation
Parameters:
  • doc - Document
  • result - Model prediction list
Returns: Dictionary with wer entry containing gt and pred

Text Normalization

remove_sp(text, language)
Removes special tokens and normalizes spacing
Parameters:
  • text - Input text
  • language - Language code ("zh", "en", etc.)
Process:
  1. Removes tokens matching <|.*|>
  2. Collapses consecutive spaces to single space
  3. Removes space before punctuation
  4. Left-strips whitespace
  5. For Chinese: removes all spaces
Returns: Normalized text string

EvaluationTokenizer Class

Language-aware tokenizer using sacreBLEU tokenizers.

Initialization

EvaluationTokenizer(
    tokenizer_type="13a",
    lowercase=False,
    punctuation_removal=False,
    character_tokenization=False
)

Parameters

  • tokenizer_type: One of "none", "13a", "intl", "zh", "ja-mecab", "char"
  • lowercase: Apply lowercasing
  • punctuation_removal: Remove punctuation tokens
  • character_tokenization: Tokenize to character level

Constants

  • SPACE = chr(32)
  • SPACE_ESCAPE = chr(9601)

Methods

remove_punctuation(sent) (classmethod)
Removes tokens that are purely punctuation
Parameters: sent - Space-separated tokens
Returns: String with punctuation-only tokens removed
tokenize(sent)
Applies tokenization pipeline
Process:
  1. Apply sacreBLEU tokenizer
  2. Optionally remove punctuation
  3. Optionally tokenize to characters
  4. Optionally lowercase
Returns: Tokenized string

WER Computation

compute_wer(refs, hyps, language)

Computes Word Error Rate using edit distance.

Parameters:

  • refs - List of reference transcriptions
  • hyps - List of hypothesis transcriptions
  • language - Language code

Process:

  1. For each ref-hyp pair:
    1. Apply language-specific normalization:
      • yue: Convert to simplified Chinese via zhconv
      • en: Apply english_normalizer
      • zh: Apply chinese_normalizer
      • Other: Apply basic_normalizer
    2. Tokenize with EvaluationTokenizer (none type, lowercase, punct removal)
    3. For Chinese/Yue: character-level tokenization
    4. Compute edit distance between token sequences
  2. Return total distance / total reference length

Returns: WER as decimal (0-1)

openasr_wer(results, args)

Aggregates WER across all results.

Parameters:

  • results - List of result dictionaries with gt and pred
  • args - Arguments (currently unused; language hardcoded to "en")

Process:

  1. Extract ground truth and predictions
  2. Apply remove_sp normalization
  3. Compute WER via compute_wer
  4. Return WER × 100

Returns: WER percentage (0-100)

Note: Contains commented legacy code for multi-source dataset evaluation.

Global Normalizers

Initialized at module level:

  • english_normalizer: EnglishTextNormalizer()
  • chinese_normalizer: TextNorm(...) with custom config
  • basic_normalizer: BasicTextNormalizer()

Chinese normalizer configuration:

  • All flags set to False (no banjiao conversion, case changes, filler/erhua removal)
  • Empty cc_mode

Dependencies

  • os, re, unicodedata
  • editdistance as ed
  • zhconv - Chinese variant conversion
  • lmms_eval.tasks.librispeech.cn_tn.TextNorm
  • lmms_eval.tasks.librispeech.whisper_normalizer.basic.BasicTextNormalizer
  • lmms_eval.tasks.librispeech.whisper_normalizer.english.EnglishTextNormalizer
  • sacrebleu.tokenizers - Various tokenizer implementations

Constants

  • PUNCS = "!,.?;:" - Punctuation characters for normalization
  • dir_name - Absolute path to module directory

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment