Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn SelectKBest

From Leeroopedia


Knowledge Sources
Domains Feature Selection, Statistical Testing
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for selecting features according to the k highest statistical scores provided by scikit-learn.

Description

SelectKBest selects the top k features based on univariate statistical tests. The module provides a suite of univariate feature selection classes including SelectKBest (top k features), SelectPercentile (top percentage), SelectFpr (false positive rate), SelectFdr (false discovery rate), SelectFwe (family-wise error rate), and GenericUnivariateSelect (configurable mode). It also provides scoring functions like f_classif (ANOVA F-value), chi2 (chi-squared), f_regression, and r_regression for different feature-target relationships.

Usage

Use SelectKBest when you want a simple, fast feature selection method based on univariate statistical tests. Choose the appropriate scoring function: f_classif for classification with continuous features, chi2 for classification with non-negative features, or f_regression for regression tasks. This is useful as a preprocessing step to reduce dimensionality before training a model.

Code Reference

Source Location

Signature

class SelectKBest(_BaseFilter):
    def __init__(self, score_func=f_classif, *, k=10):

class SelectPercentile(_BaseFilter):
    def __init__(self, score_func=f_classif, *, percentile=10):

class SelectFpr(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class SelectFdr(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class SelectFwe(_BaseFilter):
    def __init__(self, score_func=f_classif, *, alpha=5e-2):

class GenericUnivariateSelect(_BaseFilter):
    def __init__(self, score_func=f_classif, *, mode="percentile", param=1e-5):

Import

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif, chi2, f_regression

I/O Contract

Inputs

Name Type Required Description
score_func callable No Function taking (X, y) and returning (scores, pvalues) or scores. Default is f_classif.
k int or "all" No Number of top features to select. Default is 10.
X array-like of shape (n_samples, n_features) Yes Training input samples for fitting.
y array-like of shape (n_samples,) Yes Target values for computing feature scores.

Outputs

Name Type Description
X_transformed ndarray The input data with only the k best features retained.
scores_ ndarray of shape (n_features,) Scores of each feature.
pvalues_ ndarray of shape (n_features,) p-values of each feature score (if available from score_func).

Usage Examples

Basic Usage

from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=200, n_features=20, n_informative=5, random_state=42)

selector = SelectKBest(f_classif, k=5)
X_selected = selector.fit_transform(X, y)
print(f"Original features: {X.shape[1]}, Selected features: {X_selected.shape[1]}")
print(f"Selected feature indices: {selector.get_support(indices=True)}")
print(f"Feature scores: {selector.scores_}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment