Principle:Online ml River Online Naive Bayes
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Probabilistic_Classification |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Naive Bayes classifiers are a family of probabilistic models that apply Bayes' theorem with the naive assumption that features are conditionally independent given the class label. Despite this strong assumption, Naive Bayes classifiers perform remarkably well in practice, especially for text classification, spam filtering, and other high-dimensional problems.
Naive Bayes is inherently well-suited to online learning because updating the model requires only updating sufficient statistics (counts, sums, sums of squares) for each class-feature pair, making it naturally incremental with constant memory per update.
Theoretical Basis
The posterior probability of class c given features x_1, ..., x_d is:
P(c | x) = P(c) * product_{i=1}^{d} P(x_i | c) / P(x)
Since P(x) is constant across classes, classification selects the class maximizing the numerator.
Variant: Multinomial Naive Bayes
Assumes features are counts (e.g., word frequencies). The likelihood is modeled as a multinomial distribution. Sufficient statistics are per-class feature counts, updated additively with each new document.
Variant: Gaussian Naive Bayes
Assumes each feature follows a Gaussian distribution per class. Sufficient statistics are the running mean and variance for each feature-class pair, updated incrementally using Welford's algorithm or similar.
Variant: Bernoulli Naive Bayes
Assumes binary features (presence/absence). The likelihood is a product of Bernoulli distributions. This variant explicitly penalizes the absence of features, unlike the multinomial model.
Variant: Complement Naive Bayes
Estimates parameters from the complement of each class (all classes except c). This corrects for the systematic bias of standard multinomial NB on imbalanced datasets and often yields improved accuracy.
Online Update Properties
- Constant time per update: Each observation updates only the counters for the observed class.
- No retraining: Sufficient statistics are additive, so the model never needs a full re-computation.
- Laplace smoothing: A small constant is added to counts to avoid zero probabilities for unseen feature-class pairs.