Implementation:Huggingface Datasets Features
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for defining typed dataset column schemas provided by the HuggingFace Datasets library.
Description
Features is a special dictionary subclass (dict[str, FieldType]) that defines the internal structure of a dataset. Keys are column names and values are feature type descriptors. Supported field types include Value (scalars), ClassLabel (categorical labels), List/LargeList (sequences), Image, Audio, Video, Pdf, Nifti, array types (Array2D-Array5D), nested dicts, and translation types. Features provides bidirectional conversion between Python types and Arrow schemas, tracks which columns require decoding, and supports serialization to/from YAML for dataset cards.
Usage
Use Features to define or inspect the schema of any dataset. Pass it to construction methods (from_dict, from_pandas, etc.) to enforce column types, or access it via dataset.features to inspect the schema of an existing dataset.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/features/features.py - Lines: 1745-2287
Signature
class Features(dict):
def __init__(*args, **kwargs):
...
Import
from datasets import Features
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| *args / **kwargs | dict[str, FieldType] |
Yes | Column-name-to-feature-type mapping, same as a regular dict constructor. |
Outputs
| Name | Type | Description |
|---|---|---|
| instance | Features |
A Features dictionary with schema metadata and encoding/decoding support. |
Usage Examples
Basic Usage
from datasets import Features, Value, ClassLabel
features = Features({
"text": Value("string"),
"label": ClassLabel(names=["negative", "positive"]),
"score": Value("float32"),
})
print(features)
# {'text': Value(dtype='string', id=None),
# 'label': ClassLabel(num_classes=2, names=['negative', 'positive']),
# 'score': Value(dtype='float32', id=None)}