Implementation:Online ml River Stats PearsonCorr

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Statistics
Last Updated	2026-02-08 16:00 GMT

Overview

PearsonCorr computes the online Pearson correlation coefficient between two variables.

Description

This statistic measures the linear correlation between two variables in streaming data, producing values between -1 (perfect negative correlation) and +1 (perfect positive correlation). It internally maintains running variance for both variables and their covariance, using these to calculate the correlation coefficient. The implementation supports delta degrees of freedom correction and can be wrapped with utils.Rolling for windowed correlation.

Usage

Use PearsonCorr when you need to measure the strength and direction of linear relationships between two variables in streaming data. Common applications include feature correlation analysis, detecting collinearity, monitoring relationships between metrics, and feature selection where highly correlated features may be redundant.

Code Reference

Source Location

Repository: Online_ml_River
File: river/stats/pearson.py

Signature

class PearsonCorr(stats.base.Bivariate):
    def __init__(self, ddof=1):
        self.var_x = stats.Var(ddof=ddof)
        self.var_y = stats.Var(ddof=ddof)
        self.cov_xy = stats.Cov(ddof=ddof)

Import

from river import stats

I/O Contract

Inputs

Name	Type	Required	Description
x	numbers.Number	Yes	First variable value
y	numbers.Number	Yes	Second variable value
ddof	int	Yes (init)	Delta Degrees of Freedom (default: 1)

Outputs

Name	Type	Description
get()	float	Pearson correlation coefficient between -1 and 1 (0 if variance is zero)

Usage Examples

from river import stats

# Basic Pearson correlation
x = [0, 0, 0, 1, 1, 1, 1]
y = [0, 1, 2, 3, 4, 5, 6]

pearson = stats.PearsonCorr()

for xi, yi in zip(x, y):
    pearson.update(xi, yi)
    print(f"x={xi}, y={yi}, Correlation={pearson.get():.6f}")

# Output:
# x=0, y=0, Correlation=0.000000
# x=0, y=1, Correlation=0.000000
# x=0, y=2, Correlation=0.000000
# x=1, y=3, Correlation=0.774596
# x=1, y=4, Correlation=0.866025
# x=1, y=5, Correlation=0.878310
# x=1, y=6, Correlation=0.866025

# Rolling Pearson correlation
from river import utils

x = [0, 0, 0, 1, 1, 1, 1]
y = [0, 1, 2, 3, 4, 5, 6]

pearson_rolling = utils.Rolling(stats.PearsonCorr(), window_size=4)

for xi, yi in zip(x, y):
    pearson_rolling.update(xi, yi)
    print(f"Rolling Correlation: {pearson_rolling.get():.6f}")

# Output:
# 0.000000
# 0.000000
# 0.000000
# 0.774597
# 0.894427
# 0.774597
# -0.000000

# Perfect positive correlation
perfect_pos = stats.PearsonCorr()
for i in range(10):
    perfect_pos.update(i, i * 2)
print(f"Perfect positive: {perfect_pos.get():.6f}")
# Output: 1.000000

# Perfect negative correlation
perfect_neg = stats.PearsonCorr()
for i in range(10):
    perfect_neg.update(i, -i)
print(f"Perfect negative: {perfect_neg.get():.6f}")
# Output: -1.000000

# No correlation
no_corr = stats.PearsonCorr()
import random
random.seed(42)
for _ in range(100):
    no_corr.update(random.random(), random.random())
print(f"No correlation: {no_corr.get():.6f}")
# Output: close to 0

Related Pages

Environment:Online_ml_River_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment