Implementation:Online ml River Stats PearsonCorr
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Statistics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
PearsonCorr computes the online Pearson correlation coefficient between two variables.
Description
This statistic measures the linear correlation between two variables in streaming data, producing values between -1 (perfect negative correlation) and +1 (perfect positive correlation). It internally maintains running variance for both variables and their covariance, using these to calculate the correlation coefficient. The implementation supports delta degrees of freedom correction and can be wrapped with utils.Rolling for windowed correlation.
Usage
Use PearsonCorr when you need to measure the strength and direction of linear relationships between two variables in streaming data. Common applications include feature correlation analysis, detecting collinearity, monitoring relationships between metrics, and feature selection where highly correlated features may be redundant.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stats/pearson.py
Signature
class PearsonCorr(stats.base.Bivariate):
def __init__(self, ddof=1):
self.var_x = stats.Var(ddof=ddof)
self.var_y = stats.Var(ddof=ddof)
self.cov_xy = stats.Cov(ddof=ddof)
Import
from river import stats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | numbers.Number | Yes | First variable value |
| y | numbers.Number | Yes | Second variable value |
| ddof | int | Yes (init) | Delta Degrees of Freedom (default: 1) |
Outputs
| Name | Type | Description |
|---|---|---|
| get() | float | Pearson correlation coefficient between -1 and 1 (0 if variance is zero) |
Usage Examples
from river import stats
# Basic Pearson correlation
x = [0, 0, 0, 1, 1, 1, 1]
y = [0, 1, 2, 3, 4, 5, 6]
pearson = stats.PearsonCorr()
for xi, yi in zip(x, y):
pearson.update(xi, yi)
print(f"x={xi}, y={yi}, Correlation={pearson.get():.6f}")
# Output:
# x=0, y=0, Correlation=0.000000
# x=0, y=1, Correlation=0.000000
# x=0, y=2, Correlation=0.000000
# x=1, y=3, Correlation=0.774596
# x=1, y=4, Correlation=0.866025
# x=1, y=5, Correlation=0.878310
# x=1, y=6, Correlation=0.866025
# Rolling Pearson correlation
from river import utils
x = [0, 0, 0, 1, 1, 1, 1]
y = [0, 1, 2, 3, 4, 5, 6]
pearson_rolling = utils.Rolling(stats.PearsonCorr(), window_size=4)
for xi, yi in zip(x, y):
pearson_rolling.update(xi, yi)
print(f"Rolling Correlation: {pearson_rolling.get():.6f}")
# Output:
# 0.000000
# 0.000000
# 0.000000
# 0.774597
# 0.894427
# 0.774597
# -0.000000
# Perfect positive correlation
perfect_pos = stats.PearsonCorr()
for i in range(10):
perfect_pos.update(i, i * 2)
print(f"Perfect positive: {perfect_pos.get():.6f}")
# Output: 1.000000
# Perfect negative correlation
perfect_neg = stats.PearsonCorr()
for i in range(10):
perfect_neg.update(i, -i)
print(f"Perfect negative: {perfect_neg.get():.6f}")
# Output: -1.000000
# No correlation
no_corr = stats.PearsonCorr()
import random
random.seed(42)
for _ in range(100):
no_corr.update(random.random(), random.random())
print(f"No correlation: {no_corr.get():.6f}")
# Output: close to 0