Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn Get Feature Names

From Leeroopedia


Template:Metadata

Overview

Concrete tool for extracting feature names from array containers provided by scikit-learn.

Code Reference

Function: _get_feature_names(X)

Module: sklearn/utils/validation.py (lines 2320-2378)

Signature:

def _get_feature_names(X):

This is a private utility function used internally by scikit-learn estimators to extract feature names from input data containers. It supports pandas DataFrames directly and any array container implementing the __dataframe__ interchange protocol.

I/O Contract

Input:

  • X : {ndarray, dataframe} of shape (n_samples, n_features) -- The array container from which to extract feature names.

Output:

  • names : ndarray or None -- A NumPy array of feature names (dtype object) if all column names are strings. Returns None if the container has no feature names, has empty feature names, or has non-string feature names.

Error conditions:

  • Raises TypeError if the input has a mix of string and non-string feature names. The error message instructs the user to convert all column names to strings via X.columns = X.columns.astype(str).

Implementation Details

The function follows a two-stage extraction strategy:

  1. Extract raw feature names from the container:
    • For pandas DataFrames: accesses X.columns directly and converts to a NumPy array via np.asarray(X.columns, dtype=object). This avoids relying on the __dataframe__ protocol for pandas, which may introduce unnecessary copies in older versions.
    • For other containers implementing __dataframe__: calls X.__dataframe__().column_names() to retrieve column names through the DataFrame interchange protocol.
    • For all other containers (plain NumPy arrays, sparse matrices): returns None.
  2. Validate feature name types:
    • If no feature names are found or the array is empty, returns None.
    • Collects the unique types of all feature name values and sorts them.
    • If there is a mix of string and non-string types, raises a TypeError.
    • If all feature names are strings, returns the array of names.
    • If all feature names are non-strings (e.g., integer indices), returns None.

Usage Examples

import numpy as np
import pandas as pd
from sklearn.utils.validation import _get_feature_names

# With a pandas DataFrame (string columns)
df = pd.DataFrame({"age": [25, 30], "income": [50000, 60000]})
names = _get_feature_names(df)
# names -> array(['age', 'income'], dtype=object)

# With a plain NumPy array
arr = np.array([[1, 2], [3, 4]])
names = _get_feature_names(arr)
# names -> None

# With integer column names (returns None)
df_int = pd.DataFrame({0: [1, 2], 1: [3, 4]})
names = _get_feature_names(df_int)
# names -> None

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment