Implementation:Online ml River Stats Skew
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Skew computes the running skewness of a data stream using Welford's algorithm.
Description
This statistic measures the asymmetry of a probability distribution around its mean. It calculates the third standardized moment incrementally. Positive skewness indicates a distribution with a longer right tail (values concentrated on the left), while negative skewness indicates a longer left tail (values concentrated on the right). The implementation uses Rust for performance and supports both biased and unbiased estimators.
Usage
Use Skew when you need to understand the asymmetry of streaming data distributions. Common applications include detecting data distribution shifts, identifying non-normal distributions, financial risk analysis (return distributions), quality control, and feature engineering where distribution shape is informative. Skewness helps identify if data is balanced or leaning toward extreme values.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/skew.py
Signature
class Skew(stats.base.Univariate):
def __init__(self, bias=False):
super().__init__()
self.bias = bias
self._skew = _rust_stats.RsSkew(bias)
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | numbers.Number | Yes | Value to update the statistic with |
| bias | bool | Yes (init) | If False, calculations are corrected for statistical bias (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | float | Current skewness value (0 for symmetric distributions) |
Usage Examples
from river import stats
import numpy as np
# Unbiased skewness
np.random.seed(42)
X = np.random.normal(loc=0, scale=1, size=10)
skew = stats.Skew(bias=False)
for x in X:
skew.update(x)
print(f"Skew: {skew.get():.4f}")
# Output (final values):
# 0.0000
# 0.0000
# -1.4802
# 0.5127
# 0.7803
# 1.0561
# 0.5058
# 0.3478
# 0.4537
# 0.4123
# Biased skewness
skew_biased = stats.Skew(bias=True)
for x in X:
skew_biased.update(x)
print(f"Biased Skew: {skew_biased.get():.4f}")
# Detecting right-skewed distribution
right_skew = stats.Skew()
# Data concentrated on left, long right tail
for x in [1, 2, 2, 3, 3, 3, 4, 4, 5, 10, 15]:
right_skew.update(x)
print(f"Right-skewed data: {right_skew.get():.4f}")
# Positive value indicates right skew
# Detecting left-skewed distribution
left_skew = stats.Skew()
# Data concentrated on right, long left tail
for x in [1, 5, 10, 10, 11, 11, 11, 12, 12, 13]:
left_skew.update(x)
print(f"Left-skewed data: {left_skew.get():.4f}")
# Negative value indicates left skew
# Symmetric distribution (normal-like)
symmetric_skew = stats.Skew()
for x in np.random.normal(0, 1, 1000):
symmetric_skew.update(x)
print(f"Symmetric distribution skew: {symmetric_skew.get():.4f}")
# Close to 0 for symmetric distribution
# Comparing skewness of different features
feature_a_skew = stats.Skew()
feature_b_skew = stats.Skew()
# Feature A: uniformly distributed
for x in range(100):
feature_a_skew.update(x)
# Feature B: exponentially distributed
for x in np.random.exponential(2, 100):
feature_b_skew.update(x)
print(f"Uniform distribution skew: {feature_a_skew.get():.4f}")
print(f"Exponential distribution skew: {feature_b_skew.get():.4f}")