Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets MetadataConfigs

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for managing YAML front matter metadata for dataset cards provided by the HuggingFace Datasets library.

Description

MetadataConfigs is a dictionary subclass (dict[str, dict[str, Any]]) that manages the configs field in dataset card YAML front matter. Each key is a config name and each value is a dictionary of configuration parameters (including data_files, version, features, etc.). The class provides methods to read from (from_dataset_card_data) and write to (to_dataset_card_data) DatasetCardData objects. It validates data_files entries to ensure they follow the expected format (string, list of strings, or list of split-path dicts). The get_default_config_name method determines which configuration is the default.

Usage

Use MetadataConfigs when programmatically managing dataset card metadata, especially for multi-configuration datasets. It is used internally by push_to_hub to update the dataset card YAML.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/utils/metadata.py
  • Lines: 46-189

Signature

class MetadataConfigs(dict[str, dict[str, Any]]):
    """Should be in format {config_name: {**config_params}}."""

    FIELD_NAME: ClassVar[str] = METADATA_CONFIGS_FIELD

Import

from datasets.utils.metadata import MetadataConfigs

I/O Contract

Inputs

Name Type Required Description
*args / **kwargs dict[str, dict[str, Any]] No Mapping of config names to config parameter dictionaries.

Outputs

Name Type Description
instance MetadataConfigs A MetadataConfigs dictionary for managing dataset card YAML metadata.

Usage Examples

Basic Usage

from datasets.utils.metadata import MetadataConfigs

configs = MetadataConfigs({
    "default": {
        "data_files": [
            {"split": "train", "path": "data/train-*.parquet"},
            {"split": "test", "path": "data/test-*.parquet"},
        ],
    },
    "fr": {
        "data_files": [
            {"split": "train", "path": "fr/train-*.parquet"},
        ],
    },
})

print(configs.get_default_config_name())
# "default"

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment