Implementation:Online ml River Datasets Music

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Datasets, Multi_Output_Classification, Multi_Label
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete dataset for multi-output binary classification (multi-label) provided by the River library.

Description

Multi-label music mood prediction. The goal is to predict to which kinds of moods a song pertains to. Each song can belong to multiple mood categories simultaneously.

This dataset contains 593 samples with 72 features and 6 binary output labels for multi-label classification tasks.

Usage

This dataset is useful for:

Multi-label classification tasks
Music information retrieval
Emotion/mood prediction from audio features
Testing algorithms that handle multiple simultaneous binary outputs

Code Reference

Source Location

Repository: Online_ml_River
File: river/datasets/music.py

Signature

class Music(base.RemoteDataset):
    def __init__(self):
        super().__init__(
            task=base.MO_BINARY_CLF,
            n_samples=593,
            n_features=72,
            n_outputs=6,
            url="https://raw.githubusercontent.com/scikit-multiflow/streaming-datasets/master/music.csv",
            size=378_980,
            unpack=False,
        )

    def _iter(self):
        return stream.iter_csv(
            self.path,
            target=[
                "amazed-suprised",
                "happy-pleased",
                "relaxing-clam",
                "quiet-still",
                "sad-lonely",
                "angry-aggresive",
            ],
            converters={
                "amazed-suprised": lambda x: x == "1",
                "happy-pleased": lambda x: x == "1",
                "relaxing-clam": lambda x: x == "1",
                "quiet-still": lambda x: x == "1",
                "sad-lonely": lambda x: x == "1",
                "angry-aggresive": lambda x: x == "1",
                # ... MFCC and other audio features ...
            },
        )

Import

from river import datasets
dataset = datasets.Music()

I/O Contract

Inputs

Name	Type	Required	Description
(none)	—	—	No parameters needed

Outputs

Name	Type	Description
iter()	tuple(dict, dict)	Yields (features_dict, labels_dict) where labels are 6 boolean values

Dataset Properties

Property	Value
Number of samples	593
Number of features	72
Number of outputs	6
Task	Multi-output binary classification (multi-label)
Format	CSV
Size	378,980 bytes

Features

The dataset includes 72 audio features:

Mean and Standard Deviation of MFCC coefficients (Mel-Frequency Cepstral Coefficients 0-12)
Spectral features: Centroid, Rolloff, Flux
Beat histogram features: BH_LowPeakAmp, BH_LowPeakBPM, BH_HighPeakAmp, BH_HighPeakBPM, BH_HighLowRatio
Summary features: BHSUM1, BHSUM2, BHSUM3

Target Labels

Six mood categories (each is a binary label):

amazed-suprised: Excited, surprised emotional state
happy-pleased: Positive, joyful emotional state
relaxing-clam: Calm, peaceful emotional state
quiet-still: Tranquil, silent emotional state
sad-lonely: Melancholic emotional state
angry-aggresive: Intense, aggressive emotional state

Usage Examples

from river import datasets

dataset = datasets.Music()
for x, y in dataset:
    print(f"Features: {list(x.keys())[:5]}...")  # Show first 5 feature names
    print(f"Labels: {y}")
    break

References

Read, J., Reutemann, P., Pfahringer, B. and Holmes, G., 2016. MEKA: a multi-label/multi-target extension to WEKA. The Journal of Machine Learning Research, 17(1), pp.667-671. [1]

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment