Implementation:Huggingface Datasets Dataset From Dict

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for creating a Dataset from a Python dictionary provided by the HuggingFace Datasets library.

Description

Dataset.from_dict is a class method that converts a Python dictionary of column-name-to-values mappings into an Apache Arrow backed Dataset. Each dictionary key becomes a column name and each value (a list or Arrow array) provides the column data. If an explicit Features schema is supplied, columns are encoded and cast accordingly; otherwise, types are inferred. The resulting dataset lives in memory and has no associated cache directory.

Usage

Use Dataset.from_dict when you have data organized as a dictionary of lists (columnar format) and want to create a Dataset without going through file I/O. This is the standard entry point for programmatic dataset construction.

Code Reference

Source Location

Repository: datasets
File: src/datasets/arrow_dataset.py
Lines: 973-1034

Signature

@classmethod
def from_dict(
    cls,
    mapping: dict,
    features: Optional[Features] = None,
    info: Optional[DatasetInfo] = None,
    split: Optional[NamedSplit] = None,
) -> "Dataset":

Import

from datasets import Dataset

I/O Contract

Inputs

Name	Type	Required	Description
mapping	`dict`	Yes	Mapping of column names (strings) to Arrays or Python lists of values.
features	`Features`	No	Explicit dataset features schema. If provided, data is cast to match.
info	`DatasetInfo`	No	Dataset metadata such as description, citation, etc.
split	`NamedSplit`	No	Name of the dataset split (e.g., "train", "test").

Outputs

Name	Type	Description
return	`Dataset`	A new in-memory Dataset backed by an Arrow table.

Usage Examples

Basic Usage

from datasets import Dataset

ds = Dataset.from_dict({
    "text": ["Hello world", "Goodbye world"],
    "label": [1, 0],
})
print(ds)
# Dataset({
#     features: ['text', 'label'],
#     num_rows: 2
# })

With Explicit Features

from datasets import Dataset, Features, Value, ClassLabel

features = Features({
    "text": Value("string"),
    "label": ClassLabel(names=["negative", "positive"]),
})

ds = Dataset.from_dict(
    {"text": ["Hello", "World"], "label": [0, 1]},
    features=features,
)

Related Pages

Implements Principle

Principle:Huggingface_Datasets_Dataset_From_Dict_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment