Principle:Online ml River Online Naive Bayes

Knowledge Sources	Machine Learning Introduction to Information Retrieval
Domains	Online_Learning, Probabilistic_Classification
Last Updated	2026-02-08 18:00 GMT

Overview

Naive Bayes classifiers are a family of probabilistic models that apply Bayes' theorem with the naive assumption that features are conditionally independent given the class label. Despite this strong assumption, Naive Bayes classifiers perform remarkably well in practice, especially for text classification, spam filtering, and other high-dimensional problems.

Naive Bayes is inherently well-suited to online learning because updating the model requires only updating sufficient statistics (counts, sums, sums of squares) for each class-feature pair, making it naturally incremental with constant memory per update.

Theoretical Basis

The posterior probability of class c given features x_1, ..., x_d is:

P(c | x) = P(c) * product_{i=1}^{d} P(x_i | c)  /  P(x)

Since P(x) is constant across classes, classification selects the class maximizing the numerator.

Variant: Multinomial Naive Bayes

Assumes features are counts (e.g., word frequencies). The likelihood is modeled as a multinomial distribution. Sufficient statistics are per-class feature counts, updated additively with each new document.

Variant: Gaussian Naive Bayes

Assumes each feature follows a Gaussian distribution per class. Sufficient statistics are the running mean and variance for each feature-class pair, updated incrementally using Welford's algorithm or similar.

Variant: Bernoulli Naive Bayes

Assumes binary features (presence/absence). The likelihood is a product of Bernoulli distributions. This variant explicitly penalizes the absence of features, unlike the multinomial model.

Variant: Complement Naive Bayes

Estimates parameters from the complement of each class (all classes except c). This corrects for the systematic bias of standard multinomial NB on imbalanced datasets and often yields improved accuracy.

Online Update Properties

Constant time per update: Each observation updates only the counters for the observed class.
No retraining: Sufficient statistics are additive, so the model never needs a full re-computation.
Laplace smoothing: A small constant is added to counts to avoid zero probabilities for unseen feature-class pairs.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment