Implementation:Scikit learn Scikit learn SelectFromModel
| Knowledge Sources | |
|---|---|
| Domains | Feature Selection, Model-Based Selection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for selecting features based on importance weights from a fitted estimator provided by scikit-learn.
Description
SelectFromModel is a meta-transformer for selecting features based on importance weights. It works with any estimator that has a coef_ or feature_importances_ attribute after fitting (or a custom importance_getter callable). Features are selected if their importance is above a configurable threshold, which defaults to the mean importance for most estimators or a small epsilon for L1-penalized models.
Usage
Use SelectFromModel when you want to reduce the number of features based on the feature importances learned by a trained model. It is particularly effective with tree-based models (which provide feature_importances_) and linear models with L1 regularization (which zero out unimportant features).
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/feature_selection/_from_model.py
Signature
class SelectFromModel(MetaEstimatorMixin, SelectorMixin, BaseEstimator):
def __init__(
self,
estimator,
*,
threshold=None,
prefit=False,
norm_order=1,
max_features=None,
importance_getter="auto",
):
Import
from sklearn.feature_selection import SelectFromModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| estimator | estimator instance | Yes | The base estimator from which the transformer is built. Must have coef_ or feature_importances_ after fitting. |
| threshold | str or float | No | The threshold for feature selection. Features with importance >= threshold are kept. Default is None (uses mean or 1e-5 for L1). |
| prefit | bool | No | Whether the estimator is expected to be prefit. Default is False. |
| norm_order | non-zero int, inf, -inf | No | Order of the norm for computing feature importances in the case of multi-output. Default is 1. |
| max_features | int or callable | No | Maximum number of features to select. Default is None. |
| importance_getter | str or callable | No | How to get feature importances. Default is "auto" (uses coef_ or feature_importances_). |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray or sparse matrix | The input data with only the selected features. |
| estimator_ | estimator instance | The fitted estimator used to determine feature importances. |
| threshold_ | float | The threshold value used for feature selection. |
Usage Examples
Basic Usage
from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=20, n_informative=5, random_state=42)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
selector = SelectFromModel(clf, threshold="median")
X_selected = selector.fit_transform(X, y)
print(f"Original features: {X.shape[1]}, Selected features: {X_selected.shape[1]}")