Implementation:Huggingface Datasets Dataset From Pandas

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for creating a Dataset from a Pandas DataFrame provided by the HuggingFace Datasets library.

Description

Dataset.from_pandas is a class method that converts a pandas.DataFrame into a PyArrow Table and wraps it in a Dataset. Column types are inferred from the DataFrame's dtypes using PyArrow's Pandas integration. For object-typed Series, Python objects are inspected to determine the Arrow type. When the DataFrame is empty or contains only None values, the type defaults to null unless explicit features are provided. An optional preserve_index parameter controls whether the DataFrame index is stored as a column.

Usage

Use Dataset.from_pandas when you have tabular data in a Pandas DataFrame and need to convert it for use with the HuggingFace Datasets ecosystem, including training, evaluation, or Hub uploads.

Code Reference

Source Location

Repository: datasets
File: src/datasets/arrow_dataset.py
Lines: 859-926

Signature

@classmethod
def from_pandas(
    cls,
    df: pd.DataFrame,
    features: Optional[Features] = None,
    info: Optional[DatasetInfo] = None,
    split: Optional[NamedSplit] = None,
    preserve_index: Optional[bool] = None,
) -> "Dataset":

Import

from datasets import Dataset

I/O Contract

Inputs

Name	Type	Required	Description
df	`pd.DataFrame`	Yes	The Pandas DataFrame containing the dataset.
features	`Features`	No	Explicit dataset features schema for type casting.
info	`DatasetInfo`	No	Dataset metadata (description, citation, etc.).
split	`NamedSplit`	No	Name of the dataset split.
preserve_index	`bool`	No	Whether to store the index as a column. Default None stores all indexes except RangeIndex.

Outputs

Name	Type	Description
return	`Dataset`	A new in-memory Dataset backed by an Arrow table converted from the DataFrame.

Usage Examples

Basic Usage

import pandas as pd
from datasets import Dataset

df = pd.DataFrame({
    "text": ["Hello world", "Goodbye world"],
    "label": [1, 0],
})
ds = Dataset.from_pandas(df)
print(ds)
# Dataset({
#     features: ['text', 'label'],
#     num_rows: 2
# })

Related Pages

Implements Principle

Principle:Huggingface_Datasets_Dataset_From_Pandas_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment