Implementation:Scikit learn Scikit learn TargetEncoder

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Data Preprocessing, Categorical Encoding
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for encoding categorical features based on target statistics provided by scikit-learn.

Description

TargetEncoder encodes categorical features based on a shrunk estimate of the average target values for observations belonging to each category. The encoding scheme mixes the global target mean with the target mean conditioned on the value of the category. It supports regression, binary classification, and multiclass classification targets. It uses internal cross-fitting during fit_transform to prevent target leakage.

Usage

Use TargetEncoder when you want to encode categorical features using target statistics, which is especially effective for high-cardinality categorical features where one-hot encoding would produce too many columns. It is useful for both regression and classification tasks.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/preprocessing/_target_encoder.py

Signature

class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
    def __init__(
        self,
        categories="auto",
        target_type="auto",
        smooth="auto",
        cv=5,
        shuffle=True,
        random_state=None,
    ):

Import

from sklearn.preprocessing import TargetEncoder

I/O Contract

Inputs

Name	Type	Required	Description
categories	"auto" or list	No	Categories per feature. "auto" determines them from training data. Default is "auto".
target_type	str	No	Type of target: "auto", "continuous", "binary", or "multiclass". Default is "auto".
smooth	"auto" or float	No	The amount of mixing of the target mean conditioned on the category with the global target mean. Default is "auto".
cv	int	No	Number of folds for internal cross-fitting scheme during fit_transform. Default is 5.
shuffle	bool	No	Whether to shuffle the data in the cross-fitting procedure. Default is True.
random_state	int or RandomState	No	Random state for reproducibility in the cross-fitting procedure.

Outputs

Name	Type	Description
X_transformed	ndarray of shape (n_samples, n_features)	The encoded feature matrix with target-based encodings.
categories_	list of ndarray	The categories of each feature determined during fitting.
target_mean_	float	The overall mean of the target.
encodings_	list of ndarray	Encodings learnt on all of X for each feature.

Usage Examples

Basic Usage

from sklearn.preprocessing import TargetEncoder
import numpy as np

X = np.array([["dog"], ["cat"], ["dog"], ["fish"], ["cat"], ["dog"]])
y = np.array([1.0, 0.0, 1.0, 0.5, 0.0, 0.8])

enc = TargetEncoder(smooth=5.0, random_state=42)
X_encoded = enc.fit_transform(X, y)
print(X_encoded)

Related Pages

Principle:Scikit_learn_Scikit_learn_Feature_Encoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment