Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Datasets CLI

From Leeroopedia

Overview

Datasets_CLI is the entry point for the datasets-cli command-line tool in the Hugging Face Datasets library. It provides a centralized dispatcher that registers and routes subcommands for environment diagnostics, dataset testing, and Hub operations. The main() function sets up an ArgumentParser, registers available subcommands (env, test, delete_from_hub), parses command-line arguments, and dispatches execution to the appropriate subcommand handler.

Source File

Property Value
Repository huggingface/datasets
File src/datasets/commands/datasets_cli.py
Lines 39
Domain CLI, Tooling

Import

from datasets.commands.datasets_cli import main

Functions

main()

The primary entry point for the datasets-cli tool. This function:

  1. Creates an ArgumentParser with the program name "HuggingFace Datasets CLI tool" and usage pattern datasets-cli <command> [<args>].
  2. Sets logging verbosity to info level via set_verbosity_info().
  3. Registers three subcommands by calling each command class's register_subcommand static method:
    • EnvironmentCommand -- prints environment info
    • TestCommand -- tests dataset loading
    • DeleteFromHubCommand -- deletes a dataset config from the Hub
  4. Parses known and unknown arguments. Unknown arguments are converted to keyword arguments using parse_unknown_args().
  5. If no subcommand is specified (i.e., args.func is missing), prints help and exits with code 1.
  6. Instantiates the selected command via args.func(args, **kwargs) and calls its run() method.
def main():
    parser = ArgumentParser(
        "HuggingFace Datasets CLI tool", usage="datasets-cli <command> [<args>]", allow_abbrev=False
    )
    commands_parser = parser.add_subparsers(help="datasets-cli command helpers")
    set_verbosity_info()

    # Register commands
    EnvironmentCommand.register_subcommand(commands_parser)
    TestCommand.register_subcommand(commands_parser)
    DeleteFromHubCommand.register_subcommand(commands_parser)

    # Parse args
    args, unknown_args = parser.parse_known_args()
    if not hasattr(args, "func"):
        parser.print_help()
        exit(1)
    kwargs = parse_unknown_args(unknown_args)

    # Run
    service = args.func(args, **kwargs)
    service.run()

parse_unknown_args(unknown_args)

A helper function that converts a list of unknown CLI arguments into a dictionary. It pairs consecutive items, stripping leading hyphens from keys.

def parse_unknown_args(unknown_args):
    return {key.lstrip("-"): value for key, value in zip(unknown_args[::2], unknown_args[1::2])}

I/O

Direction Description
Input Command-line arguments passed to datasets-cli
Output Dispatches execution to the selected subcommand (env, test, or delete_from_hub)

Dependencies

Module Purpose
argparse.ArgumentParser Command-line argument parsing
datasets.commands.env.EnvironmentCommand Environment info subcommand
datasets.commands.test.TestCommand Dataset test subcommand
datasets.commands.delete_from_hub.DeleteFromHubCommand Hub deletion subcommand
datasets.utils.logging.set_verbosity_info Logging configuration

Usage

From the command line:

# Print environment information
datasets-cli env

# Test a dataset
datasets-cli test my_dataset

# Delete a dataset config from the Hub
datasets-cli delete_from_hub username/dataset_name config_name --token hf_xxxxx

Programmatically:

from datasets.commands.datasets_cli import main
main()

Related Pages

Categories

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment