Implementation:Speechbrain Speechbrain AMI Splits

Knowledge Sources	SpeechBrain
Domains	Speaker_Diarization, Data_Preparation
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for defining standard train/dev/test splits for the AMI Meeting Corpus provided by the SpeechBrain library.

Description

This module defines the standard data splits for the AMI corpus, which contains 100 hours of meeting recordings. It provides the get_AMI_split function that returns predefined lists of meeting IDs for the train, dev, and test sets according to three split options: "scenario_only" (scenario meetings only), "full_corpus" (the full AMI corpus), and "full_corpus_asr" (the full corpus variant for ASR tasks). Each split option maps meeting identifiers (e.g., "ES2002", "IS1000", "TS3005") to the appropriate partition. This ensures reproducibility and consistency across experiments using the AMI dataset.

Usage

Import and call get_AMI_split with the desired split option string when preparing data for AMI-based experiments such as speaker diarization, speech recognition, or meeting understanding tasks.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/AMI/ami_splits.py

Signature

ALLOWED_OPTIONS = ["scenario_only", "full_corpus", "full_corpus_asr"]

def get_AMI_split(split_option):
    """
    Prepares train, dev, and test sets for given split_option.

    Arguments
    ---------
    split_option : str
        The standard split option.
        Allowed options: "scenario_only", "full_corpus", "full_corpus_asr"

    Returns
    -------
    Meeting IDs for train, dev, and test sets for given split_option.
    """
    ...

Import

from ami_splits import get_AMI_split

I/O Contract

Inputs

Name	Type	Required	Description
split_option	str	Yes	One of "scenario_only", "full_corpus", or "full_corpus_asr"

Outputs

Name	Type	Description
train_set	list[str]	List of meeting IDs for the training partition
dev_set	list[str]	List of meeting IDs for the development partition
test_set	list[str]	List of meeting IDs for the test partition

Usage Examples

from ami_splits import get_AMI_split

# Get the scenario-only split
train_set, dev_set, test_set = get_AMI_split("scenario_only")
print(f"Train meetings: {len(train_set)}")  # e.g., 24 meetings
print(f"Dev meetings: {len(dev_set)}")      # e.g., 5 meetings
print(f"Test meetings: {len(test_set)}")    # e.g., 5 meetings

# Get the full corpus split for ASR
train_set, dev_set, test_set = get_AMI_split("full_corpus_asr")

Related Pages

Principle:Speechbrain_Speechbrain_Speaker_Diarization_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment