Implementation:Sktime Pytorch forecasting NaNLabelEncoder
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Data_Engineering, Preprocessing |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Concrete tool for encoding categorical variables to integers with NaN and unknown class handling provided by the pytorch-forecasting library.
Description
The NaNLabelEncoder class is a scikit-learn-compatible label encoder that gracefully handles NaN values and unknown categories. When add_nan=True, NaN is always encoded as class 0. Unknown categories encountered during transform (not seen during fit) are also mapped to class 0 with an optional warning. The encoder supports both string and numeric pandas Series, provides fit, transform, fit_transform, and inverse_transform methods, and tracks the class-to-index mapping in its classes_ attribute.
Usage
Use NaNLabelEncoder when: (1) pre-fitting group ID encodings before TimeSeriesDataSet construction (to ensure consistency between train and validation datasets), or (2) when TimeSeriesDataSet auto-fits encoders for categorical columns. The encoder is passed via the categorical_encoders parameter as a dict mapping column names to pre-fitted encoder instances.
Code Reference
Source Location
- Repository: pytorch-forecasting
- File: pytorch_forecasting/data/encoders.py
- Lines: L267-492
Signature
class NaNLabelEncoder(BaseEstimator, TransformerMixin):
def __init__(self, add_nan: bool = False, warn: bool = True):
"""
Label encoder with NaN handling.
Parameters
----------
add_nan : bool, optional, default=False
If to force encoding of NaN at index 0.
warn : bool, optional, default=True
If to warn when unknown items are encoded as NaN.
"""
def fit(self, y: pd.Series, overwrite: bool = False) -> "NaNLabelEncoder":
"""Fit encoder to data."""
def transform(self, y: pd.Series) -> np.ndarray:
"""Transform categories to integer indices."""
def fit_transform(self, y: pd.Series, overwrite: bool = False) -> np.ndarray:
"""Fit and transform in one step."""
def inverse_transform(self, y: np.ndarray) -> np.ndarray:
"""Convert integer indices back to original categories."""
Import
from pytorch_forecasting.data.encoders import NaNLabelEncoder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| add_nan | bool | No | Force NaN encoding at index 0 (default: False) |
| warn | bool | No | Warn on unknown categories (default: True) |
| y | pd.Series | Yes (to fit/transform) | Categorical data series to encode |
Outputs
| Name | Type | Description |
|---|---|---|
| transform() | np.ndarray | Integer-encoded array |
| inverse_transform() | np.ndarray | Original category values |
| classes_ | dict | Mapping from category to integer index |
Usage Examples
Pre-fit Encoder for DeepAR
from pytorch_forecasting.data.encoders import NaNLabelEncoder
# Pre-fit encoder on full data to ensure consistency
encoder = NaNLabelEncoder().fit(data["series"])
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="value",
group_ids=["series"],
categorical_encoders={"series": encoder},
# ... other params
)
# Validation dataset will use the same encoder via from_dataset
validation = TimeSeriesDataSet.from_dataset(training, data)
Inspect Encoder Mapping
encoder = NaNLabelEncoder(add_nan=True).fit(data["category"])
print(f"Classes: {encoder.classes_}")
# {nan: 0, 'A': 1, 'B': 2, 'C': 3}
encoded = encoder.transform(data["category"])
decoded = encoder.inverse_transform(encoded)