Implementation:Online ml River NaiveBayes ComplementNB
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Naive_Bayes, Imbalanced_Learning |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Complement Naive Bayes is designed for imbalanced datasets by estimating parameters from complement classes rather than the class itself.
Description
Unlike standard multinomial NB, Complement NB computes probabilities using all classes except the target class (the "complement"). For each class c, it calculates feature probabilities from the complement set: P(f|~c) = (total_f - count_f_in_c) / sum(complement). This approach reduces bias toward majority classes in imbalanced data. The model maintains feature counts per class, class totals, and overall feature totals. During prediction, it computes negative log-likelihoods, where lower values indicate better fit.
Usage
Use Complement NB specifically for imbalanced text classification problems where standard Multinomial NB may be biased toward frequent classes. It works with count or TF-IDF features and requires positive input values. Particularly effective when class sizes vary significantly. Supports both online and mini-batch learning modes.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/naive_bayes/complement.py
Signature
class ComplementNB(base.BaseNB):
def __init__(self, alpha=1.0):
self.alpha = alpha
self.class_counts = collections.Counter()
self.feature_counts = collections.defaultdict(collections.Counter)
self.feature_totals = collections.Counter()
self.class_totals = collections.Counter()
Import
from river import naive_bayes
I/O Contract
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| alpha | float | 1.0 | Additive smoothing parameter |
Attributes
| Attribute | Type | Description |
|---|---|---|
| class_counts | Counter | Number of instances per class |
| feature_counts | defaultdict | Feature counts per class |
| feature_totals | Counter | Total counts per feature across all classes |
| class_totals | Counter | Total feature counts per class |
Input/Output
| Method | Input | Output |
|---|---|---|
| learn_one | x: dict, y: Any | None |
| learn_many | X: DataFrame, y: Series | None |
| predict_proba_one | x: dict | dict |
| predict_proba_many | X: DataFrame | DataFrame |
Usage Examples
import pandas as pd
from river import compose
from river import feature_extraction
from river import naive_bayes
docs = [
("Chinese Beijing Chinese", "yes"),
("Chinese Chinese Shanghai", "yes"),
("Chinese Macao", "maybe"),
("Tokyo Japan Chinese", "no")
]
# Single instance learning
model = compose.Pipeline(
("tokenize", feature_extraction.BagOfWords(lowercase=False)),
("nb", naive_bayes.ComplementNB(alpha=1))
)
for sentence, label in docs:
model.learn_one(sentence, label)
model["nb"].p_class("yes")
# 0.5
model["nb"].p_class("no")
# 0.25
model["nb"].p_class("maybe")
# 0.25
model.predict_proba_one("test")
# {'yes': 0.275, 'maybe': 0.375, 'no': 0.35}
model.predict_one("test")
# 'maybe'
# Mini-batch learning
X = pd.Series([
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"
])
y = pd.Series(["yes", "yes", "maybe", "no"])
model = compose.Pipeline(
("tokenize", feature_extraction.BagOfWords(lowercase=False)),
("nb", naive_bayes.ComplementNB(alpha=1))
)
model.learn_many(X, y)
unseen = pd.Series(["Taiwanese Taipei", "Chinese Shanghai"])
model.predict_proba_many(unseen)
# maybe no yes
# 0 0.415129 0.361624 0.223247
# 1 0.248619 0.216575 0.534807
model.predict_many(unseen)
# 0 maybe
# 1 yes
# dtype: object