Principle:Scikit learn Scikit learn Naive Bayes Classification

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Supervised Learning, Probabilistic Classification
Last Updated	2026-02-08 15:00 GMT

Overview

Naive Bayes classifiers apply Bayes' theorem with the "naive" assumption that features are conditionally independent given the class label, yielding simple, fast, and surprisingly effective probabilistic classifiers.

Description

Naive Bayes classifiers are generative models that estimate the posterior probability of each class by combining a prior class probability with a likelihood term derived under the conditional independence assumption. Despite this assumption rarely holding in practice, naive Bayes classifiers perform remarkably well in many real-world applications, particularly text classification. They solve the problem of building probabilistic classifiers that are computationally efficient, require minimal training data, and naturally handle multi-class problems. Naive Bayes sits at the intersection of Bayesian statistics and classification, serving as a strong baseline and a practical choice for high-dimensional data.

Usage

Use GaussianNB when features are continuous and approximately Gaussian distributed. Use MultinomialNB for count-based features such as word counts in text classification. Use BernoulliNB when features are binary (presence/absence indicators). Use ComplementNB as an improved variant of MultinomialNB that corrects for dataset imbalance. Naive Bayes classifiers are especially effective for text classification, spam filtering, and sentiment analysis due to their natural fit with bag-of-words representations and their ability to handle high-dimensional sparse data efficiently.

Theoretical Basis

Bayes' Theorem provides the posterior probability of class $c$ given features $x$ :

$p (c | x_{1}, \dots, x_{d}) = \frac{p (c) \prod_{j = 1}^{d} p (x_{j} | c)}{p (x_{1}, \dots, x_{d})}$

The naive conditional independence assumption simplifies the joint likelihood:

$p (x_{1}, \dots, x_{d} | c) = \prod_{j = 1}^{d} p (x_{j} | c)$

The predicted class is:

$\hat{c} = \arg \max_{c} p (c) \prod_{j = 1}^{d} p (x_{j} | c)$

In practice, log probabilities are used to avoid numerical underflow:

$\hat{c} = \arg \max_{c} \log p (c) + \sum_{j = 1}^{d} \log p (x_{j} | c)$

Gaussian Naive Bayes assumes continuous features follow a Gaussian distribution within each class:

$p (x_{j} | c) = \frac{1}{\sqrt{2 π σ_{j c}^{2}}} \exp (- \frac{(x_{j} - μ_{j c})^{2}}{2 σ_{j c}^{2}})$

where $μ_{j c}$ and $σ_{j c}^{2}$ are the mean and variance of feature $j$ in class $c$ .

Multinomial Naive Bayes models feature counts with a multinomial distribution:

$p (x_{j} | c) = \frac{N_{j c} + α}{N_{c} + α d}$

where $N_{j c}$ is the count of feature $j$ in class $c$ , $N_{c}$ is the total count for class $c$ , and $α$ is a Laplace smoothing parameter.

Bernoulli Naive Bayes models binary features:

$p (x_{j} | c) = p_{j c}^{x_{j}} (1 - p_{j c})^{1 - x_{j}}$

where $p_{j c}$ is the probability of feature $j$ being present in class $c$ . Unlike the multinomial model, the Bernoulli model explicitly penalizes the absence of features.

Complement Naive Bayes computes the likelihood using all classes except $c$ (the complement), addressing the bias of standard multinomial NB toward majority classes.

Related Pages

Implementation:Scikit_learn_Scikit_learn_GaussianNB

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment