Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn SelectFromModel

From Leeroopedia


Knowledge Sources
Domains Feature Selection, Model-Based Selection
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for selecting features based on importance weights from a fitted estimator provided by scikit-learn.

Description

SelectFromModel is a meta-transformer for selecting features based on importance weights. It works with any estimator that has a coef_ or feature_importances_ attribute after fitting (or a custom importance_getter callable). Features are selected if their importance is above a configurable threshold, which defaults to the mean importance for most estimators or a small epsilon for L1-penalized models.

Usage

Use SelectFromModel when you want to reduce the number of features based on the feature importances learned by a trained model. It is particularly effective with tree-based models (which provide feature_importances_) and linear models with L1 regularization (which zero out unimportant features).

Code Reference

Source Location

Signature

class SelectFromModel(MetaEstimatorMixin, SelectorMixin, BaseEstimator):
    def __init__(
        self,
        estimator,
        *,
        threshold=None,
        prefit=False,
        norm_order=1,
        max_features=None,
        importance_getter="auto",
    ):

Import

from sklearn.feature_selection import SelectFromModel

I/O Contract

Inputs

Name Type Required Description
estimator estimator instance Yes The base estimator from which the transformer is built. Must have coef_ or feature_importances_ after fitting.
threshold str or float No The threshold for feature selection. Features with importance >= threshold are kept. Default is None (uses mean or 1e-5 for L1).
prefit bool No Whether the estimator is expected to be prefit. Default is False.
norm_order non-zero int, inf, -inf No Order of the norm for computing feature importances in the case of multi-output. Default is 1.
max_features int or callable No Maximum number of features to select. Default is None.
importance_getter str or callable No How to get feature importances. Default is "auto" (uses coef_ or feature_importances_).

Outputs

Name Type Description
X_transformed ndarray or sparse matrix The input data with only the selected features.
estimator_ estimator instance The fitted estimator used to determine feature importances.
threshold_ float The threshold value used for feature selection.

Usage Examples

Basic Usage

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=100, n_features=20, n_informative=5, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
selector = SelectFromModel(clf, threshold="median")
X_selected = selector.fit_transform(X, y)
print(f"Original features: {X.shape[1]}, Selected features: {X_selected.shape[1]}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment