Implementation:Online ml River Datasets Music
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Datasets, Multi_Output_Classification, Multi_Label |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete dataset for multi-output binary classification (multi-label) provided by the River library.
Description
Multi-label music mood prediction. The goal is to predict to which kinds of moods a song pertains to. Each song can belong to multiple mood categories simultaneously.
This dataset contains 593 samples with 72 features and 6 binary output labels for multi-label classification tasks.
Usage
This dataset is useful for:
- Multi-label classification tasks
- Music information retrieval
- Emotion/mood prediction from audio features
- Testing algorithms that handle multiple simultaneous binary outputs
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/datasets/music.py
Signature
class Music(base.RemoteDataset):
def __init__(self):
super().__init__(
task=base.MO_BINARY_CLF,
n_samples=593,
n_features=72,
n_outputs=6,
url="https://raw.githubusercontent.com/scikit-multiflow/streaming-datasets/master/music.csv",
size=378_980,
unpack=False,
)
def _iter(self):
return stream.iter_csv(
self.path,
target=[
"amazed-suprised",
"happy-pleased",
"relaxing-clam",
"quiet-still",
"sad-lonely",
"angry-aggresive",
],
converters={
"amazed-suprised": lambda x: x == "1",
"happy-pleased": lambda x: x == "1",
"relaxing-clam": lambda x: x == "1",
"quiet-still": lambda x: x == "1",
"sad-lonely": lambda x: x == "1",
"angry-aggresive": lambda x: x == "1",
# ... MFCC and other audio features ...
},
)
Import
from river import datasets
dataset = datasets.Music()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | — | — | No parameters needed |
Outputs
| Name | Type | Description |
|---|---|---|
| iter() | tuple(dict, dict) | Yields (features_dict, labels_dict) where labels are 6 boolean values |
Dataset Properties
| Property | Value |
|---|---|
| Number of samples | 593 |
| Number of features | 72 |
| Number of outputs | 6 |
| Task | Multi-output binary classification (multi-label) |
| Format | CSV |
| Size | 378,980 bytes |
Features
The dataset includes 72 audio features:
- Mean and Standard Deviation of MFCC coefficients (Mel-Frequency Cepstral Coefficients 0-12)
- Spectral features: Centroid, Rolloff, Flux
- Beat histogram features: BH_LowPeakAmp, BH_LowPeakBPM, BH_HighPeakAmp, BH_HighPeakBPM, BH_HighLowRatio
- Summary features: BHSUM1, BHSUM2, BHSUM3
Target Labels
Six mood categories (each is a binary label):
- amazed-suprised: Excited, surprised emotional state
- happy-pleased: Positive, joyful emotional state
- relaxing-clam: Calm, peaceful emotional state
- quiet-still: Tranquil, silent emotional state
- sad-lonely: Melancholic emotional state
- angry-aggresive: Intense, aggressive emotional state
Usage Examples
from river import datasets
dataset = datasets.Music()
for x, y in dataset:
print(f"Features: {list(x.keys())[:5]}...") # Show first 5 feature names
print(f"Labels: {y}")
break
References
- Read, J., Reutemann, P., Pfahringer, B. and Holmes, G., 2016. MEKA: a multi-label/multi-target extension to WEKA. The Journal of Machine Learning Research, 17(1), pp.667-671. [1]
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment