Implementation:Huggingface Datasets Get Dataset Split Names

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for querying available splits for a dataset configuration, provided by the HuggingFace Datasets library.

Description

get_dataset_split_names returns the list of available split names (e.g. ["train", "validation", "test"]) for a particular dataset and optional configuration. Internally, it delegates to get_dataset_config_info to retrieve the full DatasetInfo object, then extracts the keys of its splits dictionary.

Usage

Use get_dataset_split_names when you need a quick list of available splits before calling load_dataset. This is especially helpful for datasets where the available splits are not known in advance or vary between configurations.

Code Reference

Source Location

Repository: datasets
File: src/datasets/inspect.py
Lines: 295-350

Signature

def get_dataset_split_names(
    path: str,
    config_name: Optional[str] = None,
    data_files: Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] = None,
    download_config: Optional[DownloadConfig] = None,
    download_mode: Optional[Union[DownloadMode, str]] = None,
    revision: Optional[Union[str, Version]] = None,
    token: Optional[Union[bool, str]] = None,
    **config_kwargs,
):

Import

from datasets import get_dataset_split_names

I/O Contract

Inputs

Name	Type	Required	Description
path	`str`	Yes	Path to the dataset repository. Can be a local path or a dataset identifier on the Hugging Face Hub.
config_name	`Optional[str]`	No	Name of the dataset configuration. If `None`, uses the default configuration.
data_files	`Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]]`	No	Path(s) to source data file(s).
download_config	`Optional[DownloadConfig]`	No	Specific download configuration parameters.
download_mode	`Optional[Union[DownloadMode, str]]`	No	Download/generate mode. Defaults to `REUSE_DATASET_IF_EXISTS`.
revision	`Optional[Union[str, Version]]`	No	Version of the dataset to load (commit SHA, git tag, or branch name).
token	`Optional[Union[bool, str]]`	No	Bearer token for remote files on the Datasets Hub. If `True`, reads from `~/.huggingface`.
**config_kwargs	keyword arguments	No	Additional attributes for the builder class that override defaults.

Outputs

Name	Type	Description
split_names	`list[str]`	A list of split name strings available for the dataset configuration (e.g. `["train", "validation", "test"]`).

Usage Examples

Basic Usage

from datasets import get_dataset_split_names

splits = get_dataset_split_names("cornell-movie-review-data/rotten_tomatoes")
print(splits)
# ['train', 'validation', 'test']

With a Specific Configuration

from datasets import get_dataset_split_names

# Get splits for a specific GLUE task
splits = get_dataset_split_names("nyu-mll/glue", config_name="mrpc")
print(splits)
# ['train', 'validation', 'test']

Related Pages

Implements Principle

Principle:Huggingface_Datasets_Dataset_Split_Inspection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment