Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn TargetEncoder

From Leeroopedia


Knowledge Sources
Domains Data Preprocessing, Categorical Encoding
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for encoding categorical features based on target statistics provided by scikit-learn.

Description

TargetEncoder encodes categorical features based on a shrunk estimate of the average target values for observations belonging to each category. The encoding scheme mixes the global target mean with the target mean conditioned on the value of the category. It supports regression, binary classification, and multiclass classification targets. It uses internal cross-fitting during fit_transform to prevent target leakage.

Usage

Use TargetEncoder when you want to encode categorical features using target statistics, which is especially effective for high-cardinality categorical features where one-hot encoding would produce too many columns. It is useful for both regression and classification tasks.

Code Reference

Source Location

Signature

class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
    def __init__(
        self,
        categories="auto",
        target_type="auto",
        smooth="auto",
        cv=5,
        shuffle=True,
        random_state=None,
    ):

Import

from sklearn.preprocessing import TargetEncoder

I/O Contract

Inputs

Name Type Required Description
categories "auto" or list No Categories per feature. "auto" determines them from training data. Default is "auto".
target_type str No Type of target: "auto", "continuous", "binary", or "multiclass". Default is "auto".
smooth "auto" or float No The amount of mixing of the target mean conditioned on the category with the global target mean. Default is "auto".
cv int No Number of folds for internal cross-fitting scheme during fit_transform. Default is 5.
shuffle bool No Whether to shuffle the data in the cross-fitting procedure. Default is True.
random_state int or RandomState No Random state for reproducibility in the cross-fitting procedure.

Outputs

Name Type Description
X_transformed ndarray of shape (n_samples, n_features) The encoded feature matrix with target-based encodings.
categories_ list of ndarray The categories of each feature determined during fitting.
target_mean_ float The overall mean of the target.
encodings_ list of ndarray Encodings learnt on all of X for each feature.

Usage Examples

Basic Usage

from sklearn.preprocessing import TargetEncoder
import numpy as np

X = np.array([["dog"], ["cat"], ["dog"], ["fish"], ["cat"], ["dog"]])
y = np.array([1.0, 0.0, 1.0, 0.5, 0.0, 0.8])

enc = TargetEncoder(smooth=5.0, random_state=42)
X_encoded = enc.fit_transform(X, y)
print(X_encoded)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment