Implementation:Scikit learn Scikit learn Encode
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Data Encoding |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete utility module for encoding categorical values and finding unique elements provided by scikit-learn.
Description
The _encode module provides helper functions for finding unique values in arrays, including support for Python object dtypes and proper NaN handling. It includes _unique, _unique_np, and _unique_python functions that correctly handle missing values and work with both numpy arrays and Array API-compatible backends.
Usage
Use these utilities when you need to find unique values in label or feature arrays, particularly when NaN values or object dtypes are present, such as during label encoding or ordinal encoding operations.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/utils/_encode.py
Signature
def _unique(values, *, return_inverse=False, return_counts=False):
...
def _unique_np(values, return_inverse=False, return_counts=False):
...
def _unique_python(values, return_inverse=False, return_counts=False):
...
Import
from sklearn.utils._encode import _unique
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| values | ndarray | Yes | Values to check for unique elements |
| return_inverse | bool | No | If True, also return the indices of the unique values |
| return_counts | bool | No | If True, also return the number of times each unique item appears |
Outputs
| Name | Type | Description |
|---|---|---|
| unique | ndarray | The sorted unique values |
| unique_inverse | ndarray | Indices to reconstruct the original array from unique (if requested) |
| unique_counts | ndarray | Number of times each unique value appears (if requested) |
Usage Examples
Basic Usage
import numpy as np
from sklearn.utils._encode import _unique
values = np.array([3, 1, 2, 3, 1, 2])
uniques = _unique(values)
print(uniques) # array([1, 2, 3])
uniques, inverse = _unique(values, return_inverse=True)
print(inverse) # array([2, 0, 1, 2, 0, 1])