Implementation:Online ml River Stats Var
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Var computes the running variance of a data stream using Welford's algorithm.
Description
This statistic calculates variance incrementally as data arrives, maintaining numerical stability through Welford's method. Variance measures the spread of data around its mean. The implementation supports weighted observations, includes a revert method for rolling windows, and provides batch update capabilities. The ddof parameter controls degrees of freedom correction, with ddof=1 giving the sample variance (unbiased estimator).
Usage
Use Var when you need to measure the variability or spread of streaming data. Common applications include monitoring data quality, detecting anomalies, statistical process control, understanding feature distributions, normalization (standardization requires mean and standard deviation), and as a building block for computing standard deviation, standard error, and other higher-order statistics.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/var.py
Signature
class Var(stats.base.Univariate):
def __init__(self, ddof=1) -> None:
self.ddof = ddof
self.mean = stats.Mean()
self._S = 0
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | numbers.Number | Yes | Value to update the statistic with |
| w | float | No | Weight for the observation (default: 1.0) |
| ddof | int | Yes (init) | Delta Degrees of Freedom (default: 1 for sample variance) |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | float | Current variance (0.0 if n <= ddof) |
Usage Examples
from river import stats
# Basic running variance
X = [3, 5, 4, 7, 10, 12]
var = stats.Var()
for x in X:
var.update(x)
print(f"Value: {x}, Variance: {var.get():.6f}")
# Output:
# Value: 3, Variance: 0.000000
# Value: 5, Variance: 2.000000
# Value: 4, Variance: 1.000000
# Value: 7, Variance: 2.916666
# Value: 10, Variance: 7.700000
# Value: 12, Variance: 12.566666
# Rolling variance
from river import utils
X = [1, 4, 2, -4, -8, 0]
rvar = utils.Rolling(stats.Var(ddof=1), window_size=3)
for x in X:
rvar.update(x)
print(f"Value: {x}, Rolling Var: {rvar.get():.6f}")
# Output:
# Value: 1, Rolling Var: 0.000000
# Value: 4, Rolling Var: 4.500000
# Value: 2, Rolling Var: 2.333333
# Value: -4, Rolling Var: 17.333333
# Value: -8, Rolling Var: 25.333333
# Value: 0, Rolling Var: 16.000000
# Computing standard deviation from variance
import math
variance = stats.Var()
data = [2, 4, 4, 4, 5, 5, 7, 9]
for x in data:
variance.update(x)
print(f"Variance: {variance.get():.4f}")
print(f"Std Dev: {math.sqrt(variance.get()):.4f}")
# Monitoring data variability
quality_var = stats.Var()
# Normal operation
for x in [10.1, 10.2, 9.9, 10.0, 10.1]:
quality_var.update(x)
print(f"Normal operation variance: {quality_var.get():.4f}")
# With anomaly
quality_var.update(15.0)
print(f"With anomaly variance: {quality_var.get():.4f}")
# Weighted variance
weighted_var = stats.Var()
weighted_var.update(10, w=1.0)
weighted_var.update(20, w=2.0)
weighted_var.update(30, w=3.0)
print(f"Weighted variance: {weighted_var.get():.4f}")
# Population vs sample variance
pop_var = stats.Var(ddof=0) # Population variance
sample_var = stats.Var(ddof=1) # Sample variance
data = [1, 2, 3, 4, 5]
for x in data:
pop_var.update(x)
sample_var.update(x)
print(f"Population variance: {pop_var.get():.4f}")
print(f"Sample variance: {sample_var.get():.4f}")