Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Get Dataset Config Names

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for inspecting dataset configurations available on the Hub before loading, provided by the HuggingFace Datasets library.

Description

get_dataset_config_names is a top-level inspection function that returns the list of available configuration names for a given dataset. It resolves the dataset module via dataset_module_factory, retrieves the builder class, and reads the keys of builder_configs. If no explicit configurations are defined, it falls back to the default configuration name or the literal string "default".

Usage

Use get_dataset_config_names when you need to enumerate the available configurations of a dataset before loading. This is particularly useful for multi-configuration datasets like GLUE, SuperGLUE, or any dataset that bundles multiple tasks or subsets under one repository.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/inspect.py
  • Lines: 109-172

Signature

def get_dataset_config_names(
    path: str,
    revision: Optional[Union[str, Version]] = None,
    download_config: Optional[DownloadConfig] = None,
    download_mode: Optional[Union[DownloadMode, str]] = None,
    data_files: Optional[Union[dict, list, str]] = None,
    **download_kwargs,
):

Import

from datasets import get_dataset_config_names

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to the dataset repository. Can be a local path to the dataset directory or a dataset identifier on the Hugging Face Hub (e.g. 'nyu-mll/glue').
revision Optional[Union[str, Version]] No Version of the dataset module to load. Defaults to the local library version, falling back to the main branch.
download_config Optional[DownloadConfig] No Specific download configuration parameters (caching, proxies, etc.).
download_mode Optional[Union[DownloadMode, str]] No Download/generate mode. Defaults to REUSE_DATASET_IF_EXISTS.
data_files Optional[Union[dict, list, str]] No Defines the data files of the dataset configuration.
**download_kwargs keyword arguments No Optional attributes for DownloadConfig which override attributes in download_config, for example token.

Outputs

Name Type Description
config_names list[str] A list of configuration name strings available for the dataset. For single-config datasets this is typically ["default"].

Usage Examples

Basic Usage

from datasets import get_dataset_config_names

# List all configurations for the GLUE benchmark
configs = get_dataset_config_names("nyu-mll/glue")
print(configs)
# ['cola', 'sst2', 'mrpc', 'qqp', 'stsb', 'mnli', 'mnli_mismatched',
#  'mnli_matched', 'qnli', 'rte', 'wnli', 'ax']

With Authentication

from datasets import get_dataset_config_names

# Access a private/gated dataset
configs = get_dataset_config_names(
    "my-org/private-dataset",
    token="hf_xxxxxxxxxxxxx",
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment