Implementation:Scikit learn Scikit learn SelectKBest
| Knowledge Sources | |
|---|---|
| Domains | Feature Selection, Statistical Testing |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for selecting features according to the k highest statistical scores provided by scikit-learn.
Description
SelectKBest selects the top k features based on univariate statistical tests. The module provides a suite of univariate feature selection classes including SelectKBest (top k features), SelectPercentile (top percentage), SelectFpr (false positive rate), SelectFdr (false discovery rate), SelectFwe (family-wise error rate), and GenericUnivariateSelect (configurable mode). It also provides scoring functions like f_classif (ANOVA F-value), chi2 (chi-squared), f_regression, and r_regression for different feature-target relationships.
Usage
Use SelectKBest when you want a simple, fast feature selection method based on univariate statistical tests. Choose the appropriate scoring function: f_classif for classification with continuous features, chi2 for classification with non-negative features, or f_regression for regression tasks. This is useful as a preprocessing step to reduce dimensionality before training a model.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/feature_selection/_univariate_selection.py
Signature
class SelectKBest(_BaseFilter):
def __init__(self, score_func=f_classif, *, k=10):
class SelectPercentile(_BaseFilter):
def __init__(self, score_func=f_classif, *, percentile=10):
class SelectFpr(_BaseFilter):
def __init__(self, score_func=f_classif, *, alpha=5e-2):
class SelectFdr(_BaseFilter):
def __init__(self, score_func=f_classif, *, alpha=5e-2):
class SelectFwe(_BaseFilter):
def __init__(self, score_func=f_classif, *, alpha=5e-2):
class GenericUnivariateSelect(_BaseFilter):
def __init__(self, score_func=f_classif, *, mode="percentile", param=1e-5):
Import
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif, chi2, f_regression
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| score_func | callable | No | Function taking (X, y) and returning (scores, pvalues) or scores. Default is f_classif. |
| k | int or "all" | No | Number of top features to select. Default is 10. |
| X | array-like of shape (n_samples, n_features) | Yes | Training input samples for fitting. |
| y | array-like of shape (n_samples,) | Yes | Target values for computing feature scores. |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray | The input data with only the k best features retained. |
| scores_ | ndarray of shape (n_features,) | Scores of each feature. |
| pvalues_ | ndarray of shape (n_features,) | p-values of each feature score (if available from score_func). |
Usage Examples
Basic Usage
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=200, n_features=20, n_informative=5, random_state=42)
selector = SelectKBest(f_classif, k=5)
X_selected = selector.fit_transform(X, y)
print(f"Original features: {X.shape[1]}, Selected features: {X_selected.shape[1]}")
print(f"Selected feature indices: {selector.get_support(indices=True)}")
print(f"Feature scores: {selector.scores_}")