Implementation:Huggingface Datasets Pandas Builder
| Knowledge Sources | |
|---|---|
| Domains | Data_Loading, Tabular |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Deprecated packaged dataset builder for loading pandas pickle files into Arrow-backed datasets provided by the HuggingFace Datasets library.
Description
Pandas is a packaged dataset builder extending ArrowBasedBuilder that reads pandas pickle files (.pkl) and converts them to Arrow tables. It is configured via PandasConfig, a dataclass extending BuilderConfig, with a single optional features field for schema specification.
This builder is deprecated. The _info method issues a FutureWarning stating that the Pandas builder will be removed in the next major version of datasets.
The builder downloads data files via dl_manager, reads each file using pd.read_pickle(), converts the resulting DataFrame to an Arrow table with pa.Table.from_pandas(), and optionally casts the table to the specified features schema using table_cast. Each file produces a single table keyed by Key(i, 0).
Usage
Use Pandas via load_dataset("pandas", data_files=...) to load pandas pickle files. However, since this builder is deprecated, users should consider migrating to other formats such as Parquet or JSON.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/packaged_modules/pandas/pandas.py - Lines: 1-57
Signature
@dataclass
class PandasConfig(datasets.BuilderConfig):
"""BuilderConfig for Pandas."""
features: Optional[datasets.Features] = None
class Pandas(datasets.ArrowBasedBuilder):
BUILDER_CONFIG_CLASS = PandasConfig
def _info(self):
def _split_generators(self, dl_manager):
def _cast_table(self, pa_table: pa.Table) -> pa.Table:
def _generate_shards(self, files):
def _generate_tables(self, files):
Import
from datasets.packaged_modules.pandas.pandas import Pandas, PandasConfig
I/O Contract
Inputs (PandasConfig)
| Name | Type | Required | Description |
|---|---|---|---|
| data_files | str or list or dict |
Yes | Path(s) to pandas pickle file(s). At least one data file must be specified. |
| features | Optional[Features] |
No | Schema describing the dataset features. If provided, the Arrow table is cast to match the schema using table_cast.
|
Outputs
| Name | Type | Description |
|---|---|---|
| dataset | Dataset |
An Arrow-backed Dataset constructed from the deserialized pandas DataFrames. |
Deprecation Notice
Usage Examples
Basic Usage
from datasets import load_dataset
# Load a pandas pickle file (deprecated)
ds = load_dataset("pandas", data_files="data/train.pkl", split="train")
print(ds[0])
With Explicit Features
from datasets import load_dataset, Features, Value
# Load with explicit schema
features = Features({"text": Value("string"), "label": Value("int64")})
ds = load_dataset(
"pandas",
data_files="data/train.pkl",
features=features,
split="train",
)