Implementation:Online ml River Stats Entropy
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Entropy computes the running entropy of categorical values in a data stream.
Description
This statistic calculates Shannon entropy incrementally, measuring the unpredictability or information content in a stream of categorical values. It uses a fading factor to control how much weight is given to recent versus historical observations, and maintains a counter of value frequencies. The implementation is based on Sovdat's algorithm for updating entropy from time-changing data streams.
Usage
Use Entropy when you need to measure the diversity, randomness, or information content in streaming categorical data. Common applications include detecting concept drift, monitoring data distribution changes, feature selection, and anomaly detection in classification problems.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/entropy.py
Signature
class Entropy(stats.base.Univariate):
def __init__(self, fading_factor=1, eps=1e-8):
if 0 < fading_factor <= 1:
self.fading_factor = fading_factor
else:
raise ValueError("fading_factor must be between 0 excluded and 1")
self.eps = eps
self.entropy = 0
self.n = 0
self.counter = collections.Counter()
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | Any (hashable) | Yes | Categorical value to update entropy with |
| fading_factor | float | Yes (init) | Fading factor between 0 (exclusive) and 1 (inclusive), default: 1 |
| eps | float | Yes (init) | Small value to avoid division by zero, default: 1e-8 |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | float | Current entropy value (in nats if using natural log) |
Usage Examples
from river import stats
import random
# Create entropy statistic
entro = stats.Entropy(fading_factor=1)
# Create a list of categorical values
list_animal = []
for animal, num_val in zip(['cat', 'dog', 'bird'], [301, 401, 601]):
list_animal += [animal for i in range(num_val)]
# Shuffle the list
random.seed(42 * 1337)
random.shuffle(list_animal)
# Update entropy as we see values
for i, animal in enumerate(list_animal):
entro.update(animal)
if i % 200 == 0:
print(f"After {i+1} values: entropy = {entro.get():.6f}")
# Final entropy
print(f"Final entropy: {entro.get():.6f}")
# Output: 1.058093 (indicating moderate diversity among three categories)