Implementation:Huggingface Datasets Get Dataset Config Info

Knowledge Sources	Huggingface Datasets HF Datasets Docs
Domains	Data_Engineering, NLP
Last Updated	2026-02-14 18:00 GMT

Overview

Concrete tool for retrieving detailed metadata (features, splits, size) for a specific dataset configuration, provided by the HuggingFace Datasets library.

Description

get_dataset_config_info returns a DatasetInfo object containing the full metadata for a specific configuration of a dataset. It instantiates a DatasetBuilder via load_dataset_builder, reads its .info property, and if splits information is missing, invokes the builder's _split_generators using a StreamingDownloadManager to discover splits without downloading the full dataset. If split discovery fails, it raises a SplitsNotFoundError.

Usage

Use get_dataset_config_info when you need the complete metadata for a dataset configuration, including features, splits, description, and size information, without downloading the actual data files.

Code Reference

Source Location

Repository: datasets
File: src/datasets/inspect.py
Lines: 237-292

Signature

def get_dataset_config_info(
    path: str,
    config_name: Optional[str] = None,
    data_files: Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] = None,
    download_config: Optional[DownloadConfig] = None,
    download_mode: Optional[Union[DownloadMode, str]] = None,
    revision: Optional[Union[str, Version]] = None,
    token: Optional[Union[bool, str]] = None,
    **config_kwargs,
) -> DatasetInfo:

Import

from datasets import get_dataset_config_info

I/O Contract

Inputs

Name	Type	Required	Description
path	`str`	Yes	Path to the dataset repository. Can be a local path or a Hub dataset identifier (e.g. `'rajpurkar/squad'`).
config_name	`Optional[str]`	No	Name of the dataset configuration. If `None`, uses the default configuration.
data_files	`Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]]`	No	Path(s) to source data file(s).
download_config	`Optional[DownloadConfig]`	No	Specific download configuration parameters.
download_mode	`Optional[Union[DownloadMode, str]]`	No	Download/generate mode. Defaults to `REUSE_DATASET_IF_EXISTS`.
revision	`Optional[Union[str, Version]]`	No	Version of the dataset to load (commit SHA, git tag, or branch).
token	`Optional[Union[bool, str]]`	No	Bearer token for remote files on the Datasets Hub.
**config_kwargs	keyword arguments	No	Additional attributes for the builder class that override defaults.

Outputs

Name	Type	Description
info	`DatasetInfo`	A `DatasetInfo` object containing the dataset's features, splits, description, citation, license, dataset size, and other metadata.

Usage Examples

Basic Usage

from datasets import get_dataset_config_info

info = get_dataset_config_info("cornell-movie-review-data/rotten_tomatoes")
print(info.features)
# {'label': ClassLabel(names=['neg', 'pos']), 'text': Value('string')}
print(list(info.splits.keys()))
# ['train', 'validation', 'test']

Inspecting a Specific Configuration

from datasets import get_dataset_config_info

info = get_dataset_config_info("nyu-mll/glue", config_name="mrpc")
print(info.features)
# {'sentence1': Value('string'), 'sentence2': Value('string'),
#  'label': ClassLabel(names=['not_equivalent', 'equivalent']),
#  'idx': Value('int32')}

Related Pages

Implements Principle

Principle:Huggingface_Datasets_Dataset_Config_Info_Retrieval

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment