Implementation:Online ml River Metrics CohenKappa

Knowledge Sources	Online_ml_River
Domains	Online_Learning, Evaluation_Metrics
Last Updated	2026-02-08 16:00 GMT

Overview

Cohen's Kappa coefficient measuring inter-annotator agreement adjusted for chance agreement.

Description

CohenKappa measures the level of agreement between two annotators (or between predictions and ground truth) on a classification problem, correcting for the possibility of agreement occurring by chance. The formula is κ = (po - pe) / (1 - pe), where po is observed agreement (accuracy) and pe is expected agreement by random chance. Values range from -1 (total disagreement) through 0 (random agreement) to 1 (perfect agreement).

Usage

Use Cohen's Kappa when you want to evaluate classification performance while accounting for the possibility of correct predictions occurring by random chance. It's particularly valuable when evaluating classifiers on imbalanced datasets or when comparing different annotation strategies, as it provides a more robust measure than raw accuracy by adjusting for chance agreement.

Code Reference

Source Location

Repository: Online_ml_River
File: river/metrics/kappa.py

Signature

class CohenKappa(metrics.base.MultiClassMetric):
    def __init__(self, cm=None):
        pass

Import

from river import metrics

I/O Contract

Method	Parameters	Returns	Description
update	y_true, y_pred, [w]	None	Updates metric with true and predicted labels
get	-	float	Returns Cohen's Kappa coefficient (-1.0 to 1.0)

Usage Examples

from river import metrics

y_true = ['cat', 'ant', 'cat', 'cat', 'ant', 'bird']
y_pred = ['ant', 'ant', 'cat', 'cat', 'ant', 'cat']

metric = metrics.CohenKappa()

for yt, yp in zip(y_true, y_pred):
    metric.update(yt, yp)

print(metric)
# CohenKappa: 42.86%

# Interpretation:
# κ < 0:     Less than chance agreement (poor)
# κ = 0:     Chance agreement only
# 0 < κ < 0.2: Slight agreement
# 0.2 < κ < 0.4: Fair agreement
# 0.4 < κ < 0.6: Moderate agreement (our result)
# 0.6 < κ < 0.8: Substantial agreement
# 0.8 < κ < 1: Almost perfect agreement
# κ = 1:     Perfect agreement

# Compare with raw accuracy:
accuracy = metrics.Accuracy()
for yt, yp in zip(['cat', 'ant', 'cat', 'cat', 'ant', 'bird'],
                   ['ant', 'ant', 'cat', 'cat', 'ant', 'cat']):
    accuracy.update(yt, yp)

print(f"Accuracy: {accuracy.get():.2%}")
# Accuracy: 66.67%
# Cohen's Kappa (42.86%) is lower than accuracy because it
# adjusts for chance agreement

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment