Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Features

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for defining typed dataset column schemas provided by the HuggingFace Datasets library.

Description

Features is a special dictionary subclass (dict[str, FieldType]) that defines the internal structure of a dataset. Keys are column names and values are feature type descriptors. Supported field types include Value (scalars), ClassLabel (categorical labels), List/LargeList (sequences), Image, Audio, Video, Pdf, Nifti, array types (Array2D-Array5D), nested dicts, and translation types. Features provides bidirectional conversion between Python types and Arrow schemas, tracks which columns require decoding, and supports serialization to/from YAML for dataset cards.

Usage

Use Features to define or inspect the schema of any dataset. Pass it to construction methods (from_dict, from_pandas, etc.) to enforce column types, or access it via dataset.features to inspect the schema of an existing dataset.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/features/features.py
  • Lines: 1745-2287

Signature

class Features(dict):
    def __init__(*args, **kwargs):
        ...

Import

from datasets import Features

I/O Contract

Inputs

Name Type Required Description
*args / **kwargs dict[str, FieldType] Yes Column-name-to-feature-type mapping, same as a regular dict constructor.

Outputs

Name Type Description
instance Features A Features dictionary with schema metadata and encoding/decoding support.

Usage Examples

Basic Usage

from datasets import Features, Value, ClassLabel

features = Features({
    "text": Value("string"),
    "label": ClassLabel(names=["negative", "positive"]),
    "score": Value("float32"),
})
print(features)
# {'text': Value(dtype='string', id=None),
#  'label': ClassLabel(num_classes=2, names=['negative', 'positive']),
#  'score': Value(dtype='float32', id=None)}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment