Principle:Scikit learn Scikit learn Classification Prediction

Field	Value
sources	Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer; Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed., Springer
domains	Machine_Learning, Statistics
last_updated	2026-02-08 15:00 GMT

Overview

A function mapping that transforms input features into discrete class labels using a trained model.

Description

Classification prediction is the process of applying a trained model to new, unseen feature vectors to produce discrete class label assignments. Once a classifier has been fitted (i.e., its parameters have been estimated from training data), prediction is a purely deterministic computation that maps each input sample to one of the known classes.

For linear classifiers, prediction involves two stages:

Computing a decision function -- A linear combination of the input features and the learned weights produces a raw score (or scores, in the multiclass case) for each sample.
Applying a decision rule -- The raw scores are converted into class labels. In binary classification, a threshold (typically zero) determines the class. In multiclass classification, the class with the highest score (argmax) is selected.

In scikit-learn, the predict(X) method encapsulates both stages and returns an array of class labels. The separate decision_function(X) method provides access to the raw scores, and predict_proba(X) (where available) returns calibrated probability estimates.

Usage

Use classification prediction when:

Generating predictions on test data -- After training, call predict(X_test) to obtain class labels for evaluation.
Deploying a model in production -- The predict method is the primary inference interface for serving predictions to downstream systems.
Analyzing decision boundaries -- Use decision_function to understand the model's confidence and visualize how the feature space is partitioned.

Theoretical Basis

Decision Function

For a linear classifier with weight matrix $𝐖$ (shape: n_classes x n_features) and intercept vector $𝐛$ (shape: n_classes), the decision function for a sample $𝐱$ is:

$𝐬 (𝐱) = 𝐖 𝐱 + 𝐛$

This produces a score vector $𝐬$ of length n_classes (or a single scalar in the binary case, where only one set of weights is stored).

Threshold-Based Classification

Binary case: A single score $s (𝐱) = 𝐰^{T} 𝐱 + b$ is computed. The predicted class is:

$\hat{y} = {\begin{cases} {class}_{1} & if s (𝐱) > 0 \\ {class}_{0} & otherwise \end{cases}$

Multiclass case: A score vector is computed and the predicted class is the one with the highest score:

$\hat{y} = classes [\arg \max_{k} s_{k} (𝐱)]$

Class Label Mapping

The predicted index is mapped back to the original class label through the classes_ attribute, which stores the unique sorted class labels encountered during training. This ensures that predictions are returned in the same label space as the original training targets, regardless of whether those labels are integers, strings, or other types.

Related Pages

Implementation:Scikit_learn_Scikit_learn_LinearClassifierMixin_Predict

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment