Implementation:Speechbrain Speechbrain AMI Splits
| Knowledge Sources | |
|---|---|
| Domains | Speaker_Diarization, Data_Preparation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for defining standard train/dev/test splits for the AMI Meeting Corpus provided by the SpeechBrain library.
Description
This module defines the standard data splits for the AMI corpus, which contains 100 hours of meeting recordings. It provides the get_AMI_split function that returns predefined lists of meeting IDs for the train, dev, and test sets according to three split options: "scenario_only" (scenario meetings only), "full_corpus" (the full AMI corpus), and "full_corpus_asr" (the full corpus variant for ASR tasks). Each split option maps meeting identifiers (e.g., "ES2002", "IS1000", "TS3005") to the appropriate partition. This ensures reproducibility and consistency across experiments using the AMI dataset.
Usage
Import and call get_AMI_split with the desired split option string when preparing data for AMI-based experiments such as speaker diarization, speech recognition, or meeting understanding tasks.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/AMI/ami_splits.py
Signature
ALLOWED_OPTIONS = ["scenario_only", "full_corpus", "full_corpus_asr"]
def get_AMI_split(split_option):
"""
Prepares train, dev, and test sets for given split_option.
Arguments
---------
split_option : str
The standard split option.
Allowed options: "scenario_only", "full_corpus", "full_corpus_asr"
Returns
-------
Meeting IDs for train, dev, and test sets for given split_option.
"""
...
Import
from ami_splits import get_AMI_split
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| split_option | str | Yes | One of "scenario_only", "full_corpus", or "full_corpus_asr" |
Outputs
| Name | Type | Description |
|---|---|---|
| train_set | list[str] | List of meeting IDs for the training partition |
| dev_set | list[str] | List of meeting IDs for the development partition |
| test_set | list[str] | List of meeting IDs for the test partition |
Usage Examples
from ami_splits import get_AMI_split
# Get the scenario-only split
train_set, dev_set, test_set = get_AMI_split("scenario_only")
print(f"Train meetings: {len(train_set)}") # e.g., 24 meetings
print(f"Dev meetings: {len(dev_set)}") # e.g., 5 meetings
print(f"Test meetings: {len(test_set)}") # e.g., 5 meetings
# Get the full corpus split for ASR
train_set, dev_set, test_set = get_AMI_split("full_corpus_asr")