Implementation:Scikit learn Scikit learn TargetEncoder
| Knowledge Sources | |
|---|---|
| Domains | Data Preprocessing, Categorical Encoding |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for encoding categorical features based on target statistics provided by scikit-learn.
Description
TargetEncoder encodes categorical features based on a shrunk estimate of the average target values for observations belonging to each category. The encoding scheme mixes the global target mean with the target mean conditioned on the value of the category. It supports regression, binary classification, and multiclass classification targets. It uses internal cross-fitting during fit_transform to prevent target leakage.
Usage
Use TargetEncoder when you want to encode categorical features using target statistics, which is especially effective for high-cardinality categorical features where one-hot encoding would produce too many columns. It is useful for both regression and classification tasks.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/preprocessing/_target_encoder.py
Signature
class TargetEncoder(OneToOneFeatureMixin, _BaseEncoder):
def __init__(
self,
categories="auto",
target_type="auto",
smooth="auto",
cv=5,
shuffle=True,
random_state=None,
):
Import
from sklearn.preprocessing import TargetEncoder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| categories | "auto" or list | No | Categories per feature. "auto" determines them from training data. Default is "auto". |
| target_type | str | No | Type of target: "auto", "continuous", "binary", or "multiclass". Default is "auto". |
| smooth | "auto" or float | No | The amount of mixing of the target mean conditioned on the category with the global target mean. Default is "auto". |
| cv | int | No | Number of folds for internal cross-fitting scheme during fit_transform. Default is 5. |
| shuffle | bool | No | Whether to shuffle the data in the cross-fitting procedure. Default is True. |
| random_state | int or RandomState | No | Random state for reproducibility in the cross-fitting procedure. |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray of shape (n_samples, n_features) | The encoded feature matrix with target-based encodings. |
| categories_ | list of ndarray | The categories of each feature determined during fitting. |
| target_mean_ | float | The overall mean of the target. |
| encodings_ | list of ndarray | Encodings learnt on all of X for each feature. |
Usage Examples
Basic Usage
from sklearn.preprocessing import TargetEncoder
import numpy as np
X = np.array([["dog"], ["cat"], ["dog"], ["fish"], ["cat"], ["dog"]])
y = np.array([1.0, 0.0, 1.0, 0.5, 0.0, 0.8])
enc = TargetEncoder(smooth=5.0, random_state=42)
X_encoded = enc.fit_transform(X, y)
print(X_encoded)