Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Datasets Pandas Builder

From Leeroopedia
Knowledge Sources
Domains Data_Loading, Tabular
Last Updated 2026-02-14 18:00 GMT

Overview

Deprecated packaged dataset builder for loading pandas pickle files into Arrow-backed datasets provided by the HuggingFace Datasets library.

Description

Pandas is a packaged dataset builder extending ArrowBasedBuilder that reads pandas pickle files (.pkl) and converts them to Arrow tables. It is configured via PandasConfig, a dataclass extending BuilderConfig, with a single optional features field for schema specification.

This builder is deprecated. The _info method issues a FutureWarning stating that the Pandas builder will be removed in the next major version of datasets.

The builder downloads data files via dl_manager, reads each file using pd.read_pickle(), converts the resulting DataFrame to an Arrow table with pa.Table.from_pandas(), and optionally casts the table to the specified features schema using table_cast. Each file produces a single table keyed by Key(i, 0).

Usage

Use Pandas via load_dataset("pandas", data_files=...) to load pandas pickle files. However, since this builder is deprecated, users should consider migrating to other formats such as Parquet or JSON.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/packaged_modules/pandas/pandas.py
  • Lines: 1-57

Signature

@dataclass
class PandasConfig(datasets.BuilderConfig):
    """BuilderConfig for Pandas."""
    features: Optional[datasets.Features] = None


class Pandas(datasets.ArrowBasedBuilder):
    BUILDER_CONFIG_CLASS = PandasConfig

    def _info(self):
    def _split_generators(self, dl_manager):
    def _cast_table(self, pa_table: pa.Table) -> pa.Table:
    def _generate_shards(self, files):
    def _generate_tables(self, files):

Import

from datasets.packaged_modules.pandas.pandas import Pandas, PandasConfig

I/O Contract

Inputs (PandasConfig)

Name Type Required Description
data_files str or list or dict Yes Path(s) to pandas pickle file(s). At least one data file must be specified.
features Optional[Features] No Schema describing the dataset features. If provided, the Arrow table is cast to match the schema using table_cast.

Outputs

Name Type Description
dataset Dataset An Arrow-backed Dataset constructed from the deserialized pandas DataFrames.

Deprecation Notice

Template:Warning

Usage Examples

Basic Usage

from datasets import load_dataset

# Load a pandas pickle file (deprecated)
ds = load_dataset("pandas", data_files="data/train.pkl", split="train")
print(ds[0])

With Explicit Features

from datasets import load_dataset, Features, Value

# Load with explicit schema
features = Features({"text": Value("string"), "label": Value("int64")})
ds = load_dataset(
    "pandas",
    data_files="data/train.pkl",
    features=features,
    split="train",
)

Related Pages

Implements Principle

Uses Heuristic

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment