Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Get Dataset Split Names

From Leeroopedia
Revision as of 12:59, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datasets_Get_Dataset_Split_Names.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for querying available splits for a dataset configuration, provided by the HuggingFace Datasets library.

Description

get_dataset_split_names returns the list of available split names (e.g. ["train", "validation", "test"]) for a particular dataset and optional configuration. Internally, it delegates to get_dataset_config_info to retrieve the full DatasetInfo object, then extracts the keys of its splits dictionary.

Usage

Use get_dataset_split_names when you need a quick list of available splits before calling load_dataset. This is especially helpful for datasets where the available splits are not known in advance or vary between configurations.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/inspect.py
  • Lines: 295-350

Signature

def get_dataset_split_names(
    path: str,
    config_name: Optional[str] = None,
    data_files: Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] = None,
    download_config: Optional[DownloadConfig] = None,
    download_mode: Optional[Union[DownloadMode, str]] = None,
    revision: Optional[Union[str, Version]] = None,
    token: Optional[Union[bool, str]] = None,
    **config_kwargs,
):

Import

from datasets import get_dataset_split_names

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the dataset repository. Can be a local path or a dataset identifier on the Hugging Face Hub.
config_name Optional[str] No Name of the dataset configuration. If None, uses the default configuration.
data_files Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] No Path(s) to source data file(s).
download_config Optional[DownloadConfig] No Specific download configuration parameters.
download_mode Optional[Union[DownloadMode, str]] No Download/generate mode. Defaults to REUSE_DATASET_IF_EXISTS.
revision Optional[Union[str, Version]] No Version of the dataset to load (commit SHA, git tag, or branch name).
token Optional[Union[bool, str]] No Bearer token for remote files on the Datasets Hub. If True, reads from ~/.huggingface.
**config_kwargs keyword arguments No Additional attributes for the builder class that override defaults.

Outputs

Name Type Description
split_names list[str] A list of split name strings available for the dataset configuration (e.g. ["train", "validation", "test"]).

Usage Examples

Basic Usage

from datasets import get_dataset_split_names

splits = get_dataset_split_names("cornell-movie-review-data/rotten_tomatoes")
print(splits)
# ['train', 'validation', 'test']

With a Specific Configuration

from datasets import get_dataset_split_names

# Get splits for a specific GLUE task
splits = get_dataset_split_names("nyu-mll/glue", config_name="mrpc")
print(splits)
# ['train', 'validation', 'test']

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment