Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Datasets Dataset From Dict

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for creating a Dataset from a Python dictionary provided by the HuggingFace Datasets library.

Description

Dataset.from_dict is a class method that converts a Python dictionary of column-name-to-values mappings into an Apache Arrow backed Dataset. Each dictionary key becomes a column name and each value (a list or Arrow array) provides the column data. If an explicit Features schema is supplied, columns are encoded and cast accordingly; otherwise, types are inferred. The resulting dataset lives in memory and has no associated cache directory.

Usage

Use Dataset.from_dict when you have data organized as a dictionary of lists (columnar format) and want to create a Dataset without going through file I/O. This is the standard entry point for programmatic dataset construction.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/arrow_dataset.py
  • Lines: 973-1034

Signature

@classmethod
def from_dict(
    cls,
    mapping: dict,
    features: Optional[Features] = None,
    info: Optional[DatasetInfo] = None,
    split: Optional[NamedSplit] = None,
) -> "Dataset":

Import

from datasets import Dataset

I/O Contract

Inputs

Name Type Required Description
mapping dict Yes Mapping of column names (strings) to Arrays or Python lists of values.
features Features No Explicit dataset features schema. If provided, data is cast to match.
info DatasetInfo No Dataset metadata such as description, citation, etc.
split NamedSplit No Name of the dataset split (e.g., "train", "test").

Outputs

Name Type Description
return Dataset A new in-memory Dataset backed by an Arrow table.

Usage Examples

Basic Usage

from datasets import Dataset

ds = Dataset.from_dict({
    "text": ["Hello world", "Goodbye world"],
    "label": [1, 0],
})
print(ds)
# Dataset({
#     features: ['text', 'label'],
#     num_rows: 2
# })

With Explicit Features

from datasets import Dataset, Features, Value, ClassLabel

features = Features({
    "text": Value("string"),
    "label": ClassLabel(names=["negative", "positive"]),
})

ds = Dataset.from_dict(
    {"text": ["Hello", "World"], "label": [0, 1]},
    features=features,
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment