Implementation:Scikit learn Scikit learn SelectKBest

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Feature Selection, Statistical Testing
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for selecting features according to the k highest statistical scores provided by scikit-learn.

Description

SelectKBest selects the top k features based on univariate statistical tests. The module provides a suite of univariate feature selection classes including SelectKBest (top k features), SelectPercentile (top percentage), SelectFpr (false positive rate), SelectFdr (false discovery rate), SelectFwe (family-wise error rate), and GenericUnivariateSelect (configurable mode). It also provides scoring functions like f_classif (ANOVA F-value), chi2 (chi-squared), f_regression, and r_regression for different feature-target relationships.

Usage

Use SelectKBest when you want a simple, fast feature selection method based on univariate statistical tests. Choose the appropriate scoring function: f_classif for classification with continuous features, chi2 for classification with non-negative features, or f_regression for regression tasks. This is useful as a preprocessing step to reduce dimensionality before training a model.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/feature_selection/_univariate_selection.py

Signature

class SelectKBest(_BaseFilter):
    def __init__(self, score_func=f_classif, *, k=10):

class SelectPercentile(_BaseFilter):
    def __init__(self, score_func=f_classif, *, percentile=10):

class SelectFpr(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class SelectFdr(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class SelectFwe(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class GenericUnivariateSelect(_BaseFilter):
    def __init__(self, score_func=f_classif, *, mode="percentile", param=1e-5):

Import

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif, chi2, f_regression

I/O Contract

Inputs

Name	Type	Required	Description
score_func	callable	No	Function taking (X, y) and returning (scores, pvalues) or scores. Default is f_classif.
k	int or "all"	No	Number of top features to select. Default is 10.
X	array-like of shape (n_samples, n_features)	Yes	Training input samples for fitting.
y	array-like of shape (n_samples,)	Yes	Target values for computing feature scores.

Outputs

Name	Type	Description
X_transformed	ndarray	The input data with only the k best features retained.
scores_	ndarray of shape (n_features,)	Scores of each feature.
pvalues_	ndarray of shape (n_features,)	p-values of each feature score (if available from score_func).

Usage Examples

Basic Usage

from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=200, n_features=20, n_informative=5, random_state=42)

selector = SelectKBest(f_classif, k=5)
X_selected = selector.fit_transform(X, y)
print(f"Original features: {X.shape[1]}, Selected features: {X_selected.shape[1]}")
print(f"Selected feature indices: {selector.get_support(indices=True)}")
print(f"Feature scores: {selector.scores_}")

Related Pages

Principle:Scikit_learn_Scikit_learn_Feature_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment