Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River NaiveBayes ComplementNB

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Naive_Bayes, Imbalanced_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Complement Naive Bayes is designed for imbalanced datasets by estimating parameters from complement classes rather than the class itself.

Description

Unlike standard multinomial NB, Complement NB computes probabilities using all classes except the target class (the "complement"). For each class c, it calculates feature probabilities from the complement set: P(f|~c) = (total_f - count_f_in_c) / sum(complement). This approach reduces bias toward majority classes in imbalanced data. The model maintains feature counts per class, class totals, and overall feature totals. During prediction, it computes negative log-likelihoods, where lower values indicate better fit.

Usage

Use Complement NB specifically for imbalanced text classification problems where standard Multinomial NB may be biased toward frequent classes. It works with count or TF-IDF features and requires positive input values. Particularly effective when class sizes vary significantly. Supports both online and mini-batch learning modes.

Code Reference

Source Location

Signature

class ComplementNB(base.BaseNB):
    def __init__(self, alpha=1.0):
        self.alpha = alpha
        self.class_counts = collections.Counter()
        self.feature_counts = collections.defaultdict(collections.Counter)
        self.feature_totals = collections.Counter()
        self.class_totals = collections.Counter()

Import

from river import naive_bayes

I/O Contract

Parameters

Parameter Type Default Description
alpha float 1.0 Additive smoothing parameter

Attributes

Attribute Type Description
class_counts Counter Number of instances per class
feature_counts defaultdict Feature counts per class
feature_totals Counter Total counts per feature across all classes
class_totals Counter Total feature counts per class

Input/Output

Method Input Output
learn_one x: dict, y: Any None
learn_many X: DataFrame, y: Series None
predict_proba_one x: dict dict
predict_proba_many X: DataFrame DataFrame

Usage Examples

import pandas as pd
from river import compose
from river import feature_extraction
from river import naive_bayes

docs = [
    ("Chinese Beijing Chinese", "yes"),
    ("Chinese Chinese Shanghai", "yes"),
    ("Chinese Macao", "maybe"),
    ("Tokyo Japan Chinese", "no")
]

# Single instance learning
model = compose.Pipeline(
    ("tokenize", feature_extraction.BagOfWords(lowercase=False)),
    ("nb", naive_bayes.ComplementNB(alpha=1))
)

for sentence, label in docs:
    model.learn_one(sentence, label)

model["nb"].p_class("yes")
# 0.5

model["nb"].p_class("no")
# 0.25

model["nb"].p_class("maybe")
# 0.25

model.predict_proba_one("test")
# {'yes': 0.275, 'maybe': 0.375, 'no': 0.35}

model.predict_one("test")
# 'maybe'

# Mini-batch learning
X = pd.Series([
   "Chinese Beijing Chinese",
   "Chinese Chinese Shanghai",
   "Chinese Macao",
   "Tokyo Japan Chinese"
])

y = pd.Series(["yes", "yes", "maybe", "no"])

model = compose.Pipeline(
    ("tokenize", feature_extraction.BagOfWords(lowercase=False)),
    ("nb", naive_bayes.ComplementNB(alpha=1))
)

model.learn_many(X, y)

unseen = pd.Series(["Taiwanese Taipei", "Chinese Shanghai"])

model.predict_proba_many(unseen)
#       maybe        no       yes
# 0  0.415129  0.361624  0.223247
# 1  0.248619  0.216575  0.534807

model.predict_many(unseen)
# 0    maybe
# 1      yes
# dtype: object

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment