Implementation:Huggingface Datasets Get Dataset Config Info
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for retrieving detailed metadata (features, splits, size) for a specific dataset configuration, provided by the HuggingFace Datasets library.
Description
get_dataset_config_info returns a DatasetInfo object containing the full metadata for a specific configuration of a dataset. It instantiates a DatasetBuilder via load_dataset_builder, reads its .info property, and if splits information is missing, invokes the builder's _split_generators using a StreamingDownloadManager to discover splits without downloading the full dataset. If split discovery fails, it raises a SplitsNotFoundError.
Usage
Use get_dataset_config_info when you need the complete metadata for a dataset configuration, including features, splits, description, and size information, without downloading the actual data files.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/inspect.py - Lines: 237-292
Signature
def get_dataset_config_info(
path: str,
config_name: Optional[str] = None,
data_files: Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] = None,
download_config: Optional[DownloadConfig] = None,
download_mode: Optional[Union[DownloadMode, str]] = None,
revision: Optional[Union[str, Version]] = None,
token: Optional[Union[bool, str]] = None,
**config_kwargs,
) -> DatasetInfo:
Import
from datasets import get_dataset_config_info
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str |
Yes | Path to the dataset repository. Can be a local path or a Hub dataset identifier (e.g. 'rajpurkar/squad').
|
| config_name | Optional[str] |
No | Name of the dataset configuration. If None, uses the default configuration.
|
| data_files | Optional[Union[str, Sequence[str], Mapping[str, Union[str, Sequence[str]]]]] |
No | Path(s) to source data file(s). |
| download_config | Optional[DownloadConfig] |
No | Specific download configuration parameters. |
| download_mode | Optional[Union[DownloadMode, str]] |
No | Download/generate mode. Defaults to REUSE_DATASET_IF_EXISTS.
|
| revision | Optional[Union[str, Version]] |
No | Version of the dataset to load (commit SHA, git tag, or branch). |
| token | Optional[Union[bool, str]] |
No | Bearer token for remote files on the Datasets Hub. |
| **config_kwargs | keyword arguments | No | Additional attributes for the builder class that override defaults. |
Outputs
| Name | Type | Description |
|---|---|---|
| info | DatasetInfo |
A DatasetInfo object containing the dataset's features, splits, description, citation, license, dataset size, and other metadata.
|
Usage Examples
Basic Usage
from datasets import get_dataset_config_info
info = get_dataset_config_info("cornell-movie-review-data/rotten_tomatoes")
print(info.features)
# {'label': ClassLabel(names=['neg', 'pos']), 'text': Value('string')}
print(list(info.splits.keys()))
# ['train', 'validation', 'test']
Inspecting a Specific Configuration
from datasets import get_dataset_config_info
info = get_dataset_config_info("nyu-mll/glue", config_name="mrpc")
print(info.features)
# {'sentence1': Value('string'), 'sentence2': Value('string'),
# 'label': ClassLabel(names=['not_equivalent', 'equivalent']),
# 'idx': Value('int32')}