Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Naive Bayes Classification

From Leeroopedia


Knowledge Sources
Domains Supervised Learning, Probabilistic Classification
Last Updated 2026-02-08 15:00 GMT

Overview

Naive Bayes classifiers apply Bayes' theorem with the "naive" assumption that features are conditionally independent given the class label, yielding simple, fast, and surprisingly effective probabilistic classifiers.

Description

Naive Bayes classifiers are generative models that estimate the posterior probability of each class by combining a prior class probability with a likelihood term derived under the conditional independence assumption. Despite this assumption rarely holding in practice, naive Bayes classifiers perform remarkably well in many real-world applications, particularly text classification. They solve the problem of building probabilistic classifiers that are computationally efficient, require minimal training data, and naturally handle multi-class problems. Naive Bayes sits at the intersection of Bayesian statistics and classification, serving as a strong baseline and a practical choice for high-dimensional data.

Usage

Use GaussianNB when features are continuous and approximately Gaussian distributed. Use MultinomialNB for count-based features such as word counts in text classification. Use BernoulliNB when features are binary (presence/absence indicators). Use ComplementNB as an improved variant of MultinomialNB that corrects for dataset imbalance. Naive Bayes classifiers are especially effective for text classification, spam filtering, and sentiment analysis due to their natural fit with bag-of-words representations and their ability to handle high-dimensional sparse data efficiently.

Theoretical Basis

Bayes' Theorem provides the posterior probability of class c given features x:

p(c|x1,,xd)=p(c)j=1dp(xj|c)p(x1,,xd)

The naive conditional independence assumption simplifies the joint likelihood:

p(x1,,xd|c)=j=1dp(xj|c)

The predicted class is:

c^=argmaxcp(c)j=1dp(xj|c)

In practice, log probabilities are used to avoid numerical underflow:

c^=argmaxclogp(c)+j=1dlogp(xj|c)

Gaussian Naive Bayes assumes continuous features follow a Gaussian distribution within each class:

p(xj|c)=12πσjc2exp((xjμjc)22σjc2)

where μjc and σjc2 are the mean and variance of feature j in class c.

Multinomial Naive Bayes models feature counts with a multinomial distribution:

p(xj|c)=Njc+αNc+αd

where Njc is the count of feature j in class c, Nc is the total count for class c, and α is a Laplace smoothing parameter.

Bernoulli Naive Bayes models binary features:

p(xj|c)=pjcxj(1pjc)1xj

where pjc is the probability of feature j being present in class c. Unlike the multinomial model, the Bernoulli model explicitly penalizes the absence of features.

Complement Naive Bayes computes the likelihood using all classes except c (the complement), addressing the bias of standard multinomial NB toward majority classes.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment