Implementation:Huggingface Datasets ClassLabel
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for encoding categorical class labels as integers with name mappings provided by the HuggingFace Datasets library.
Description
ClassLabel is a dataclass feature type for integer class labels. There are three ways to define a ClassLabel: by providing num_classes (creates labels "0" to "num_classes-1"), by providing a list of names, or by providing a names_file (one label per line). Under the hood, labels are stored as int64 Arrow values. Bidirectional conversion is provided via str2int() and int2str() methods. Negative integers represent unknown/missing labels. The cast_storage method can convert both string and integer Arrow arrays to the ClassLabel storage type.
Usage
Use ClassLabel to define label columns in classification datasets. It is the standard feature type for sentiment labels, category tags, entity types, and any finite set of classes.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/features/features.py - Lines: 982-1177
Signature
@dataclass
class ClassLabel:
num_classes: InitVar[Optional[int]] = None
names: list[str] = None
names_file: InitVar[Optional[str]] = None
id: Optional[str] = field(default=None, repr=False)
# Automatically constructed
dtype: ClassVar[str] = "int64"
pa_type: ClassVar[Any] = pa.int64()
_str2int: ClassVar[dict[str, int]] = None
_int2str: ClassVar[dict[int, int]] = None
_type: str = field(default="ClassLabel", init=False, repr=False)
Import
from datasets import ClassLabel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| num_classes | int |
No | Number of classes. All labels must be < num_classes. Mutually exclusive with names/names_file. |
| names | list[str] |
No | List of string label names. Order is preserved. |
| names_file | str |
No | Path to a file with one label name per line. |
| id | str |
No | Optional feature identifier. |
Outputs
| Name | Type | Description |
|---|---|---|
| instance | ClassLabel |
A ClassLabel feature with bidirectional str-to-int mapping. |
Usage Examples
Basic Usage
from datasets import Features, ClassLabel
features = Features({
"label": ClassLabel(num_classes=3, names=["bad", "ok", "good"]),
})
print(features)
# {'label': ClassLabel(names=['bad', 'ok', 'good'])}
# Convert between strings and integers
label_feature = features["label"]
print(label_feature.str2int("good")) # 2
print(label_feature.int2str(0)) # 'bad'