Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River NaiveBayes BernoulliNB

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Naive_Bayes, Text_Classification
Last Updated 2026-02-08 16:00 GMT

Overview

Bernoulli Naive Bayes classifies based on binary feature occurrences, particularly effective for text classification with word presence/absence.

Description

This implementation treats features as binary variables based on a threshold. During training, it tracks whether each feature value exceeds true_threshold for each class. Predictions use the Bernoulli model: P(feature|class) for present features and 1-P(feature|class) for absent features. Laplace smoothing (alpha parameter) prevents zero probabilities. The model maintains class counts and feature occurrence counts, computing joint log-likelihood by summing log probabilities across all known features (not just those present in the query).

Usage

Use Bernoulli NB for text classification with bag-of-words features when word presence matters more than frequency. It's particularly suited for short documents or when vocabulary size is large. Works well with binary or count features after thresholding. Supports both single-instance and mini-batch learning. Pair with feature extraction like BagOfWords for text data.

Code Reference

Source Location

Signature

class BernoulliNB(base.BaseNB):
    def __init__(self, alpha=1.0, true_threshold=0.0):
        self.alpha = alpha
        self.true_threshold = true_threshold
        self.class_counts = collections.Counter()
        self.feature_counts = collections.defaultdict(collections.Counter)

Import

from river import naive_bayes

I/O Contract

Parameters

Parameter Type Default Description
alpha float 1.0 Laplace/Lidstone smoothing parameter
true_threshold float 0.0 Threshold for binarizing features

Attributes

Attribute Type Description
class_counts Counter Number of instances per class
feature_counts defaultdict Feature occurrence counts per class

Input/Output

Method Input Output
learn_one x: dict, y: Any None
learn_many X: DataFrame, y: Series None
predict_proba_one x: dict dict
predict_proba_many X: DataFrame DataFrame

Usage Examples

import pandas as pd
from river import compose
from river import feature_extraction
from river import naive_bayes

docs = [
    ("Chinese Beijing Chinese", "yes"),
    ("Chinese Chinese Shanghai", "yes"),
    ("Chinese Macao", "yes"),
    ("Tokyo Japan Chinese", "no")
]

# Single instance learning
model = compose.Pipeline(
    ("tokenize", feature_extraction.BagOfWords(lowercase=False)),
    ("nb", naive_bayes.BernoulliNB(alpha=1))
)

for sentence, label in docs:
    model.learn_one(sentence, label)

model["nb"].p_class("yes")
# 0.75
model["nb"].p_class("no")
# 0.25

model.predict_proba_one("test")
# {'yes': 0.883..., 'no': 0.116...}

model.predict_one("test")
# 'yes'

# Mini-batch learning
X = pd.Series([
   "Chinese Beijing Chinese",
   "Chinese Chinese Shanghai",
   "Chinese Macao",
   "Tokyo Japan Chinese"
])

y = pd.Series(["yes", "yes", "yes", "no"])

model = compose.Pipeline(
    ("tokenize", feature_extraction.BagOfWords(lowercase=False)),
    ("nb", naive_bayes.BernoulliNB(alpha=1))
)

model.learn_many(X, y)

unseen = pd.Series(["Taiwanese Taipei", "Chinese Shanghai"])

model.predict_proba_many(unseen)
#          no       yes
# 0  0.116846  0.883154
# 1  0.047269  0.952731

model.predict_many(unseen)
# 0    yes
# 1    yes
# dtype: object

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment