Implementation:Huggingface Datasets Pandas Builder

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Loading, Tabular
Last Updated	2026-02-14 18:00 GMT

Overview

Deprecated packaged dataset builder for loading pandas pickle files into Arrow-backed datasets provided by the HuggingFace Datasets library.

Description

Pandas is a packaged dataset builder extending ArrowBasedBuilder that reads pandas pickle files (.pkl) and converts them to Arrow tables. It is configured via PandasConfig, a dataclass extending BuilderConfig, with a single optional features field for schema specification.

This builder is deprecated. The _info method issues a FutureWarning stating that the Pandas builder will be removed in the next major version of datasets.

The builder downloads data files via dl_manager, reads each file using pd.read_pickle(), converts the resulting DataFrame to an Arrow table with pa.Table.from_pandas(), and optionally casts the table to the specified features schema using table_cast. Each file produces a single table keyed by Key(i, 0).

Usage

Use Pandas via load_dataset("pandas", data_files=...) to load pandas pickle files. However, since this builder is deprecated, users should consider migrating to other formats such as Parquet or JSON.

Code Reference

Source Location

Repository: datasets
File: src/datasets/packaged_modules/pandas/pandas.py
Lines: 1-57

Signature

@dataclass
class PandasConfig(datasets.BuilderConfig):
    """BuilderConfig for Pandas."""
    features: Optional[datasets.Features] = None


class Pandas(datasets.ArrowBasedBuilder):
    BUILDER_CONFIG_CLASS = PandasConfig

    def _info(self):
    def _split_generators(self, dl_manager):
    def _cast_table(self, pa_table: pa.Table) -> pa.Table:
    def _generate_shards(self, files):
    def _generate_tables(self, files):

Import

from datasets.packaged_modules.pandas.pandas import Pandas, PandasConfig

I/O Contract

Inputs (PandasConfig)

Name	Type	Required	Description
data_files	`str` or `list` or `dict`	Yes	Path(s) to pandas pickle file(s). At least one data file must be specified.
features	`Optional[Features]`	No	Schema describing the dataset features. If provided, the Arrow table is cast to match the schema using `table_cast`.

Outputs

Name	Type	Description
dataset	`Dataset`	An Arrow-backed Dataset constructed from the deserialized pandas DataFrames.

Deprecation Notice

Template:Warning

Usage Examples

Basic Usage

from datasets import load_dataset

# Load a pandas pickle file (deprecated)
ds = load_dataset("pandas", data_files="data/train.pkl", split="train")
print(ds[0])

With Explicit Features

from datasets import load_dataset, Features, Value

# Load with explicit schema
features = Features({"text": Value("string"), "label": Value("int64")})
ds = load_dataset(
    "pandas",
    data_files="data/train.pkl",
    features=features,
    split="train",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment