Implementation:Scikit learn Scikit learn Get Feature Names
Appearance
Overview
Concrete tool for extracting feature names from array containers provided by scikit-learn.
Code Reference
Function: _get_feature_names(X)
Module: sklearn/utils/validation.py (lines 2320-2378)
Signature:
def _get_feature_names(X):
This is a private utility function used internally by scikit-learn estimators to extract feature names from input data containers. It supports pandas DataFrames directly and any array container implementing the __dataframe__ interchange protocol.
I/O Contract
Input:
X:{ndarray, dataframe}of shape(n_samples, n_features)-- The array container from which to extract feature names.
Output:
names:ndarrayorNone-- A NumPy array of feature names (dtypeobject) if all column names are strings. ReturnsNoneif the container has no feature names, has empty feature names, or has non-string feature names.
Error conditions:
- Raises
TypeErrorif the input has a mix of string and non-string feature names. The error message instructs the user to convert all column names to strings viaX.columns = X.columns.astype(str).
Implementation Details
The function follows a two-stage extraction strategy:
- Extract raw feature names from the container:
- For pandas DataFrames: accesses
X.columnsdirectly and converts to a NumPy array vianp.asarray(X.columns, dtype=object). This avoids relying on the__dataframe__protocol for pandas, which may introduce unnecessary copies in older versions. - For other containers implementing
__dataframe__: callsX.__dataframe__().column_names()to retrieve column names through the DataFrame interchange protocol. - For all other containers (plain NumPy arrays, sparse matrices): returns
None.
- For pandas DataFrames: accesses
- Validate feature name types:
- If no feature names are found or the array is empty, returns
None. - Collects the unique types of all feature name values and sorts them.
- If there is a mix of string and non-string types, raises a
TypeError. - If all feature names are strings, returns the array of names.
- If all feature names are non-strings (e.g., integer indices), returns
None.
- If no feature names are found or the array is empty, returns
Usage Examples
import numpy as np
import pandas as pd
from sklearn.utils.validation import _get_feature_names
# With a pandas DataFrame (string columns)
df = pd.DataFrame({"age": [25, 30], "income": [50000, 60000]})
names = _get_feature_names(df)
# names -> array(['age', 'income'], dtype=object)
# With a plain NumPy array
arr = np.array([[1, 2], [3, 4]])
names = _get_feature_names(arr)
# names -> None
# With integer column names (returns None)
df_int = pd.DataFrame({0: [1, 2], 1: [3, 4]})
names = _get_feature_names(df_int)
# names -> None
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment