Implementation:Speechbrain Speechbrain Train ESC50 Classification
| Knowledge Sources | |
|---|---|
| Domains | Sound_Classification, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a sound classifier on the ESC-50 dataset provided by the SpeechBrain library.
Description
This recipe defines the ESC50Brain class (subclass of sb.core.Brain) for environmental sound classification on the ESC-50 dataset. The pipeline computes STFT features, optionally applies mel-spectrogram conversion and log1p normalization, then feeds the result into an embedding model (supporting CNN14, FocalNet, ViT, or Conv2D architectures) followed by a classifier head. Supports data augmentation including waveform augmentation and WHAM! noise addition. Evaluation includes confusion matrix generation.
Usage
Use this recipe to train and evaluate a sound classifier on the ESC-50 dataset. Requires the ESC-50-master data folder. Supports multiple backbone architectures via hyperparameter configuration files (cnn14.yaml, focalnet.yaml, vit.yaml, conv2d.yaml).
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/ESC50/classification/train.py
Signature
class ESC50Brain(sb.core.Brain):
def compute_forward(self, batch, stage):
...
def compute_objectives(self, predictions, batch, stage):
...
Import
python recipes/ESC50/classification/train.py hparams/cnn14.yaml --data_folder /path/to/ESC-50-master
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | PaddedBatch | Yes | Batch containing sig (waveforms) and class_string_encoded (class labels) |
| stage | sb.Stage | Yes | TRAIN, VALID, or TEST |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | tuple | Classifier output logits and lens |
| loss | torch.Tensor | Cross-entropy classification loss |
Usage Examples
python train.py hparams/cnn14.yaml --data_folder /path/to/ESC-50-master