Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Stats Skew

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Statistics
Last Updated 2026-02-08 16:00 GMT

Overview

Skew computes the running skewness of a data stream using Welford's algorithm.

Description

This statistic measures the asymmetry of a probability distribution around its mean. It calculates the third standardized moment incrementally. Positive skewness indicates a distribution with a longer right tail (values concentrated on the left), while negative skewness indicates a longer left tail (values concentrated on the right). The implementation uses Rust for performance and supports both biased and unbiased estimators.

Usage

Use Skew when you need to understand the asymmetry of streaming data distributions. Common applications include detecting data distribution shifts, identifying non-normal distributions, financial risk analysis (return distributions), quality control, and feature engineering where distribution shape is informative. Skewness helps identify if data is balanced or leaning toward extreme values.

Code Reference

Source Location

Signature

class Skew(stats.base.Univariate):
    def __init__(self, bias=False):
        super().__init__()
        self.bias = bias
        self._skew = _rust_stats.RsSkew(bias)

Import

from river import stats

I/O Contract

Inputs

Name Type Required Description
x numbers.Number Yes Value to update the statistic with
bias bool Yes (init) If False, calculations are corrected for statistical bias (default: False)

Outputs

Name Type Description
get() float Current skewness value (0 for symmetric distributions)

Usage Examples

from river import stats
import numpy as np

# Unbiased skewness
np.random.seed(42)
X = np.random.normal(loc=0, scale=1, size=10)

skew = stats.Skew(bias=False)
for x in X:
    skew.update(x)
    print(f"Skew: {skew.get():.4f}")

# Output (final values):
# 0.0000
# 0.0000
# -1.4802
# 0.5127
# 0.7803
# 1.0561
# 0.5058
# 0.3478
# 0.4537
# 0.4123

# Biased skewness
skew_biased = stats.Skew(bias=True)
for x in X:
    skew_biased.update(x)
    print(f"Biased Skew: {skew_biased.get():.4f}")

# Detecting right-skewed distribution
right_skew = stats.Skew()
# Data concentrated on left, long right tail
for x in [1, 2, 2, 3, 3, 3, 4, 4, 5, 10, 15]:
    right_skew.update(x)

print(f"Right-skewed data: {right_skew.get():.4f}")
# Positive value indicates right skew

# Detecting left-skewed distribution
left_skew = stats.Skew()
# Data concentrated on right, long left tail
for x in [1, 5, 10, 10, 11, 11, 11, 12, 12, 13]:
    left_skew.update(x)

print(f"Left-skewed data: {left_skew.get():.4f}")
# Negative value indicates left skew

# Symmetric distribution (normal-like)
symmetric_skew = stats.Skew()
for x in np.random.normal(0, 1, 1000):
    symmetric_skew.update(x)

print(f"Symmetric distribution skew: {symmetric_skew.get():.4f}")
# Close to 0 for symmetric distribution

# Comparing skewness of different features
feature_a_skew = stats.Skew()
feature_b_skew = stats.Skew()

# Feature A: uniformly distributed
for x in range(100):
    feature_a_skew.update(x)

# Feature B: exponentially distributed
for x in np.random.exponential(2, 100):
    feature_b_skew.update(x)

print(f"Uniform distribution skew: {feature_a_skew.get():.4f}")
print(f"Exponential distribution skew: {feature_b_skew.get():.4f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment