Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Open ASR Utils

From Leeroopedia

Task utility functions for the Open ASR (Automatic Speech Recognition) benchmark, which evaluates speech recognition models using Word Error Rate (WER) metrics.

Location

/tmp/kapso_repo_sslb_59s/lmms_eval/tasks/open_asr/utils.py

Overview

Provides audio processing, result handling, and WER computation for ASR tasks. Supports multiple languages (English, Chinese, Yue) with language-specific text normalization and tokenization.

Core Functions

Document Processing

openasr_doc_to_audio(doc)
Extracts audio file path from document
Parameters: doc - Document dictionary
Process: Tries keys in order: audio, file, path, audio_path
Returns: List containing audio file path
Raises: KeyError if no audio field found
openasr_doc_to_text(doc, lmms_eval_specific_kwargs)
Constructs fixed ASR prompt
Parameters: doc, lmms_eval_specific_kwargs (with prompts)
Returns: "{pre_prompt}Please recognize the speech and only output the recognized content:{post_prompt}"
openasr_doc_to_target(doc)
Extracts ground truth transcription
Process: Tries keys in order: text, transcript, gt
Returns: Ground truth string
Raises: KeyError if no target field found

Result Processing

openasr_process_result(doc, result)
Packages prediction with ground truth for WER computation
Parameters:
  • doc - Document
  • result - Model prediction list
Returns: Dictionary with wer entry containing gt and pred

Text Normalization

remove_sp(text, language)
Removes special tokens and normalizes spacing
Parameters:
  • text - Input text
  • language - Language code ("zh", "en", etc.)
Process:
  1. Removes tokens matching <|.*|>
  2. Collapses consecutive spaces to single space
  3. Removes space before punctuation
  4. Left-strips whitespace
  5. For Chinese: removes all spaces
Returns: Normalized text string

EvaluationTokenizer Class

Language-aware tokenizer using sacreBLEU tokenizers.

Initialization

EvaluationTokenizer(
    tokenizer_type="13a",
    lowercase=False,
    punctuation_removal=False,
    character_tokenization=False
)

Parameters

  • tokenizer_type: One of "none", "13a", "intl", "zh", "ja-mecab", "char"
  • lowercase: Apply lowercasing
  • punctuation_removal: Remove punctuation tokens
  • character_tokenization: Tokenize to character level

Constants

  • SPACE = chr(32)
  • SPACE_ESCAPE = chr(9601)

Methods

remove_punctuation(sent) (classmethod)
Removes tokens that are purely punctuation
Parameters: sent - Space-separated tokens
Returns: String with punctuation-only tokens removed
tokenize(sent)
Applies tokenization pipeline
Process:
  1. Apply sacreBLEU tokenizer
  2. Optionally remove punctuation
  3. Optionally tokenize to characters
  4. Optionally lowercase
Returns: Tokenized string

WER Computation

compute_wer(refs, hyps, language)

Computes Word Error Rate using edit distance.

Parameters:

  • refs - List of reference transcriptions
  • hyps - List of hypothesis transcriptions
  • language - Language code

Process:

  1. For each ref-hyp pair:
    1. Apply language-specific normalization:
      • yue: Convert to simplified Chinese via zhconv
      • en: Apply english_normalizer
      • zh: Apply chinese_normalizer
      • Other: Apply basic_normalizer
    2. Tokenize with EvaluationTokenizer (none type, lowercase, punct removal)
    3. For Chinese/Yue: character-level tokenization
    4. Compute edit distance between token sequences
  2. Return total distance / total reference length

Returns: WER as decimal (0-1)

openasr_wer(results, args)

Aggregates WER across all results.

Parameters:

  • results - List of result dictionaries with gt and pred
  • args - Arguments (currently unused; language hardcoded to "en")

Process:

  1. Extract ground truth and predictions
  2. Apply remove_sp normalization
  3. Compute WER via compute_wer
  4. Return WER × 100

Returns: WER percentage (0-100)

Note: Contains commented legacy code for multi-source dataset evaluation.

Global Normalizers

Initialized at module level:

  • english_normalizer: EnglishTextNormalizer()
  • chinese_normalizer: TextNorm(...) with custom config
  • basic_normalizer: BasicTextNormalizer()

Chinese normalizer configuration:

  • All flags set to False (no banjiao conversion, case changes, filler/erhua removal)
  • Empty cc_mode

Dependencies

  • os, re, unicodedata
  • editdistance as ed
  • zhconv - Chinese variant conversion
  • lmms_eval.tasks.librispeech.cn_tn.TextNorm
  • lmms_eval.tasks.librispeech.whisper_normalizer.basic.BasicTextNormalizer
  • lmms_eval.tasks.librispeech.whisper_normalizer.english.EnglishTextNormalizer
  • sacrebleu.tokenizers - Various tokenizer implementations

Constants

  • PUNCS = "!,.?;:" - Punctuation characters for normalization
  • dir_name - Absolute path to module directory

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment