Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Stats Entropy

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Statistics
Last Updated 2026-02-08 16:00 GMT

Overview

Entropy computes the running entropy of categorical values in a data stream.

Description

This statistic calculates Shannon entropy incrementally, measuring the unpredictability or information content in a stream of categorical values. It uses a fading factor to control how much weight is given to recent versus historical observations, and maintains a counter of value frequencies. The implementation is based on Sovdat's algorithm for updating entropy from time-changing data streams.

Usage

Use Entropy when you need to measure the diversity, randomness, or information content in streaming categorical data. Common applications include detecting concept drift, monitoring data distribution changes, feature selection, and anomaly detection in classification problems.

Code Reference

Source Location

Signature

class Entropy(stats.base.Univariate):
    def __init__(self, fading_factor=1, eps=1e-8):
        if 0 < fading_factor <= 1:
            self.fading_factor = fading_factor
        else:
            raise ValueError("fading_factor must be between 0 excluded and 1")
        self.eps = eps
        self.entropy = 0
        self.n = 0
        self.counter = collections.Counter()

Import

from river import stats

I/O Contract

Inputs

Name Type Required Description
x Any (hashable) Yes Categorical value to update entropy with
fading_factor float Yes (init) Fading factor between 0 (exclusive) and 1 (inclusive), default: 1
eps float Yes (init) Small value to avoid division by zero, default: 1e-8

Outputs

Name Type Description
get() float Current entropy value (in nats if using natural log)

Usage Examples

from river import stats
import random

# Create entropy statistic
entro = stats.Entropy(fading_factor=1)

# Create a list of categorical values
list_animal = []
for animal, num_val in zip(['cat', 'dog', 'bird'], [301, 401, 601]):
    list_animal += [animal for i in range(num_val)]

# Shuffle the list
random.seed(42 * 1337)
random.shuffle(list_animal)

# Update entropy as we see values
for i, animal in enumerate(list_animal):
    entro.update(animal)
    if i % 200 == 0:
        print(f"After {i+1} values: entropy = {entro.get():.6f}")

# Final entropy
print(f"Final entropy: {entro.get():.6f}")
# Output: 1.058093 (indicating moderate diversity among three categories)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment