Implementation:Online ml River Metrics CohenKappa
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Evaluation_Metrics |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Cohen's Kappa coefficient measuring inter-annotator agreement adjusted for chance agreement.
Description
CohenKappa measures the level of agreement between two annotators (or between predictions and ground truth) on a classification problem, correcting for the possibility of agreement occurring by chance. The formula is κ = (po - pe) / (1 - pe), where po is observed agreement (accuracy) and pe is expected agreement by random chance. Values range from -1 (total disagreement) through 0 (random agreement) to 1 (perfect agreement).
Usage
Use Cohen's Kappa when you want to evaluate classification performance while accounting for the possibility of correct predictions occurring by random chance. It's particularly valuable when evaluating classifiers on imbalanced datasets or when comparing different annotation strategies, as it provides a more robust measure than raw accuracy by adjusting for chance agreement.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/metrics/kappa.py
Signature
class CohenKappa(metrics.base.MultiClassMetric):
def __init__(self, cm=None):
pass
Import
from river import metrics
I/O Contract
| Method | Parameters | Returns | Description |
|---|---|---|---|
| update | y_true, y_pred, [w] | None | Updates metric with true and predicted labels |
| get | - | float | Returns Cohen's Kappa coefficient (-1.0 to 1.0) |
Usage Examples
from river import metrics
y_true = ['cat', 'ant', 'cat', 'cat', 'ant', 'bird']
y_pred = ['ant', 'ant', 'cat', 'cat', 'ant', 'cat']
metric = metrics.CohenKappa()
for yt, yp in zip(y_true, y_pred):
metric.update(yt, yp)
print(metric)
# CohenKappa: 42.86%
# Interpretation:
# κ < 0: Less than chance agreement (poor)
# κ = 0: Chance agreement only
# 0 < κ < 0.2: Slight agreement
# 0.2 < κ < 0.4: Fair agreement
# 0.4 < κ < 0.6: Moderate agreement (our result)
# 0.6 < κ < 0.8: Substantial agreement
# 0.8 < κ < 1: Almost perfect agreement
# κ = 1: Perfect agreement
# Compare with raw accuracy:
accuracy = metrics.Accuracy()
for yt, yp in zip(['cat', 'ant', 'cat', 'cat', 'ant', 'bird'],
['ant', 'ant', 'cat', 'cat', 'ant', 'cat']):
accuracy.update(yt, yp)
print(f"Accuracy: {accuracy.get():.2%}")
# Accuracy: 66.67%
# Cohen's Kappa (42.86%) is lower than accuracy because it
# adjusts for chance agreement