Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain AMI Splits

From Leeroopedia
Revision as of 16:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Speechbrain_Speechbrain_AMI_Splits.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Speaker_Diarization, Data_Preparation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for defining standard train/dev/test splits for the AMI Meeting Corpus provided by the SpeechBrain library.

Description

This module defines the standard data splits for the AMI corpus, which contains 100 hours of meeting recordings. It provides the get_AMI_split function that returns predefined lists of meeting IDs for the train, dev, and test sets according to three split options: "scenario_only" (scenario meetings only), "full_corpus" (the full AMI corpus), and "full_corpus_asr" (the full corpus variant for ASR tasks). Each split option maps meeting identifiers (e.g., "ES2002", "IS1000", "TS3005") to the appropriate partition. This ensures reproducibility and consistency across experiments using the AMI dataset.

Usage

Import and call get_AMI_split with the desired split option string when preparing data for AMI-based experiments such as speaker diarization, speech recognition, or meeting understanding tasks.

Code Reference

Source Location

Signature

ALLOWED_OPTIONS = ["scenario_only", "full_corpus", "full_corpus_asr"]

def get_AMI_split(split_option):
    """
    Prepares train, dev, and test sets for given split_option.

    Arguments
    ---------
    split_option : str
        The standard split option.
        Allowed options: "scenario_only", "full_corpus", "full_corpus_asr"

    Returns
    -------
    Meeting IDs for train, dev, and test sets for given split_option.
    """
    ...

Import

from ami_splits import get_AMI_split

I/O Contract

Inputs

Name Type Required Description
split_option str Yes One of "scenario_only", "full_corpus", or "full_corpus_asr"

Outputs

Name Type Description
train_set list[str] List of meeting IDs for the training partition
dev_set list[str] List of meeting IDs for the development partition
test_set list[str] List of meeting IDs for the test partition

Usage Examples

from ami_splits import get_AMI_split

# Get the scenario-only split
train_set, dev_set, test_set = get_AMI_split("scenario_only")
print(f"Train meetings: {len(train_set)}")  # e.g., 24 meetings
print(f"Dev meetings: {len(dev_set)}")      # e.g., 5 meetings
print(f"Test meetings: {len(test_set)}")    # e.g., 5 meetings

# Get the full corpus split for ASR
train_set, dev_set, test_set = get_AMI_split("full_corpus_asr")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment