Implementation:Huggingface Datasets MetadataConfigs
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for managing YAML front matter metadata for dataset cards provided by the HuggingFace Datasets library.
Description
MetadataConfigs is a dictionary subclass (dict[str, dict[str, Any]]) that manages the configs field in dataset card YAML front matter. Each key is a config name and each value is a dictionary of configuration parameters (including data_files, version, features, etc.). The class provides methods to read from (from_dataset_card_data) and write to (to_dataset_card_data) DatasetCardData objects. It validates data_files entries to ensure they follow the expected format (string, list of strings, or list of split-path dicts). The get_default_config_name method determines which configuration is the default.
Usage
Use MetadataConfigs when programmatically managing dataset card metadata, especially for multi-configuration datasets. It is used internally by push_to_hub to update the dataset card YAML.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/utils/metadata.py - Lines: 46-189
Signature
class MetadataConfigs(dict[str, dict[str, Any]]):
"""Should be in format {config_name: {**config_params}}."""
FIELD_NAME: ClassVar[str] = METADATA_CONFIGS_FIELD
Import
from datasets.utils.metadata import MetadataConfigs
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| *args / **kwargs | dict[str, dict[str, Any]] |
No | Mapping of config names to config parameter dictionaries. |
Outputs
| Name | Type | Description |
|---|---|---|
| instance | MetadataConfigs |
A MetadataConfigs dictionary for managing dataset card YAML metadata. |
Usage Examples
Basic Usage
from datasets.utils.metadata import MetadataConfigs
configs = MetadataConfigs({
"default": {
"data_files": [
{"split": "train", "path": "data/train-*.parquet"},
{"split": "test", "path": "data/test-*.parquet"},
],
},
"fr": {
"data_files": [
{"split": "train", "path": "fr/train-*.parquet"},
],
},
})
print(configs.get_default_config_name())
# "default"