Implementation:Huggingface Datasets Get Dataset Split Names
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for querying available splits for a dataset configuration, provided by the HuggingFace Datasets library.
Description
get_dataset_split_names returns the list of available split names (e.g. ["train", "validation", "test"]) for a particular dataset and optional configuration. Internally, it delegates to get_dataset_config_info to retrieve the full DatasetInfo object, then extracts the keys of its splits dictionary.
Usage
Use get_dataset_split_names when you need a quick list of available splits before calling load_dataset. This is especially helpful for datasets where the available splits are not known in advance or vary between configurations.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/inspect.py - Lines: 295-350
Signature
def get_dataset_split_names(
path: str,
config_name: Optional[str] = None,
data_files: Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] = None,
download_config: Optional[DownloadConfig] = None,
download_mode: Optional[Union[DownloadMode, str]] = None,
revision: Optional[Union[str, Version]] = None,
token: Optional[Union[bool, str]] = None,
**config_kwargs,
):
Import
from datasets import get_dataset_split_names
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str |
Yes | Path to the dataset repository. Can be a local path or a dataset identifier on the Hugging Face Hub. |
| config_name | Optional[str] |
No | Name of the dataset configuration. If None, uses the default configuration.
|
| data_files | Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] |
No | Path(s) to source data file(s). |
| download_config | Optional[DownloadConfig] |
No | Specific download configuration parameters. |
| download_mode | Optional[Union[DownloadMode, str]] |
No | Download/generate mode. Defaults to REUSE_DATASET_IF_EXISTS.
|
| revision | Optional[Union[str, Version]] |
No | Version of the dataset to load (commit SHA, git tag, or branch name). |
| token | Optional[Union[bool, str]] |
No | Bearer token for remote files on the Datasets Hub. If True, reads from ~/.huggingface.
|
| **config_kwargs | keyword arguments | No | Additional attributes for the builder class that override defaults. |
Outputs
| Name | Type | Description |
|---|---|---|
| split_names | list[str] |
A list of split name strings available for the dataset configuration (e.g. ["train", "validation", "test"]).
|
Usage Examples
Basic Usage
from datasets import get_dataset_split_names
splits = get_dataset_split_names("cornell-movie-review-data/rotten_tomatoes")
print(splits)
# ['train', 'validation', 'test']
With a Specific Configuration
from datasets import get_dataset_split_names
# Get splits for a specific GLUE task
splits = get_dataset_split_names("nyu-mll/glue", config_name="mrpc")
print(splits)
# ['train', 'validation', 'test']