Implementation:Huggingface Datasets Get Dataset Config Names
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for inspecting dataset configurations available on the Hub before loading, provided by the HuggingFace Datasets library.
Description
get_dataset_config_names is a top-level inspection function that returns the list of available configuration names for a given dataset. It resolves the dataset module via dataset_module_factory, retrieves the builder class, and reads the keys of builder_configs. If no explicit configurations are defined, it falls back to the default configuration name or the literal string "default".
Usage
Use get_dataset_config_names when you need to enumerate the available configurations of a dataset before loading. This is particularly useful for multi-configuration datasets like GLUE, SuperGLUE, or any dataset that bundles multiple tasks or subsets under one repository.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/inspect.py - Lines: 109-172
Signature
def get_dataset_config_names(
path: str,
revision: Optional[Union[str, Version]] = None,
download_config: Optional[DownloadConfig] = None,
download_mode: Optional[Union[DownloadMode, str]] = None,
data_files: Optional[Union[dict, list, str]] = None,
**download_kwargs,
):
Import
from datasets import get_dataset_config_names
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str |
Yes | Path to the dataset repository. Can be a local path to the dataset directory or a dataset identifier on the Hugging Face Hub (e.g. 'nyu-mll/glue').
|
| revision | Optional[Union[str, Version]] |
No | Version of the dataset module to load. Defaults to the local library version, falling back to the main branch. |
| download_config | Optional[DownloadConfig] |
No | Specific download configuration parameters (caching, proxies, etc.). |
| download_mode | Optional[Union[DownloadMode, str]] |
No | Download/generate mode. Defaults to REUSE_DATASET_IF_EXISTS.
|
| data_files | Optional[Union[dict, list, str]] |
No | Defines the data files of the dataset configuration. |
| **download_kwargs | keyword arguments | No | Optional attributes for DownloadConfig which override attributes in download_config, for example token.
|
Outputs
| Name | Type | Description |
|---|---|---|
| config_names | list[str] |
A list of configuration name strings available for the dataset. For single-config datasets this is typically ["default"].
|
Usage Examples
Basic Usage
from datasets import get_dataset_config_names
# List all configurations for the GLUE benchmark
configs = get_dataset_config_names("nyu-mll/glue")
print(configs)
# ['cola', 'sst2', 'mrpc', 'qqp', 'stsb', 'mnli', 'mnli_mismatched',
# 'mnli_matched', 'qnli', 'rte', 'wnli', 'ax']
With Authentication
from datasets import get_dataset_config_names
# Access a private/gated dataset
configs = get_dataset_config_names(
"my-org/private-dataset",
token="hf_xxxxxxxxxxxxx",
)