Implementation:Online ml River Datasets Bananas

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Datasets, Binary_Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Concrete dataset for binary classification provided by the River library.

Description

An artificial dataset where instances belongs to several clusters with a banana shape. There are two attributes that correspond to the x and y axis, respectively.

This dataset contains 5,300 samples with 2 features for binary classification tasks.

Usage

This dataset is useful for:

Testing binary classification algorithms
Evaluating clustering algorithms on non-linear cluster shapes
Benchmarking performance on synthetically generated data with known properties

Code Reference

Source Location

Repository: Online_ml_River
File: river/datasets/bananas.py

Signature

class Bananas(base.FileDataset):
    def __init__(self):
        super().__init__(filename="banana.zip", n_samples=5300, n_features=2, task=base.BINARY_CLF)

    def __iter__(self):
        return stream.iter_libsvm(self.path, target_type=lambda x: x == "1")

Import

from river import datasets
dataset = datasets.Bananas()

I/O Contract

Inputs

Name	Type	Required	Description
(none)	—	—	No parameters needed

Outputs

Name	Type	Description
iter()	tuple(dict, bool)	Yields (features_dict, target) pairs where target is boolean

Dataset Properties

Property	Value
Number of samples	5,300
Number of features	2
Task	Binary classification
Format	LibSVM

Usage Examples

from river import datasets

dataset = datasets.Bananas()
for x, y in dataset:
    print(x, y)
    break

References

OpenML page

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment