Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Logging Utils

From Leeroopedia
Knowledge Sources
Domains Logging, Configuration
Last Updated 2026-02-14 18:00 GMT

Overview

Library-wide logging configuration with verbosity controls and environment variable support.

Description

This module provides a centralized logging system for the HuggingFace Datasets library. It configures a library root logger (named "datasets") with a StreamHandler at module load time. The default verbosity level is WARNING, but can be overridden via the DATASETS_VERBOSITY environment variable (valid values: "debug", "info", "warning", "error", "critical").

The module exposes several public functions:

  • get_logger: Returns a named logger within the datasets hierarchy. This is the primary function used by dataset builders and internal modules to obtain a logger.
  • get_verbosity / set_verbosity: Get or set the root logger's effective level using standard logging level constants.
  • set_verbosity_info, set_verbosity_warning, set_verbosity_debug, set_verbosity_error: Convenience shortcuts for common verbosity levels.
  • disable_propagation / enable_propagation: Control whether log messages propagate to parent loggers. Propagation is disabled by default.

The module also re-exports standard logging level constants (DEBUG, INFO, WARNING, ERROR, CRITICAL, etc.) and progress bar utilities (tqdm, enable_progress_bar, disable_progress_bar, is_progress_bar_enabled) for backward compatibility.

Usage

Use get_logger(__name__) in any module within the datasets library to obtain a properly configured logger. Use the set_verbosity_* functions or the DATASETS_VERBOSITY environment variable to control how much logging output is displayed.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/utils/logging.py
  • Lines: 1-175

Signature

def get_logger(name: Optional[str] = None) -> logging.Logger:
    """Return a logger with the specified name. This function can be used in dataset builders."""

def get_verbosity() -> int:
    """Return the current level for the HuggingFace datasets library's root logger."""

def set_verbosity(verbosity: int) -> None:
    """Set the level for the Hugging Face Datasets library's root logger."""

def set_verbosity_info():
    """Set the level for the root logger to INFO."""

def set_verbosity_warning():
    """Set the level for the root logger to WARNING."""

def set_verbosity_debug():
    """Set the level for the root logger to DEBUG."""

def set_verbosity_error():
    """Set the level for the root logger to ERROR."""

def disable_propagation() -> None:
    """Disable propagation of the library log outputs."""

def enable_propagation() -> None:
    """Enable propagation of the library log outputs."""

Import

from datasets.utils.logging import get_logger, set_verbosity_info

I/O Contract

get_logger

Name Type Required Description
name Optional[str] No Logger name. If None, returns the library root logger ("datasets"). Typically set to __name__ for module-level loggers.

Returns: logging.Logger -- A logger in the datasets hierarchy.

get_verbosity

Returns: int -- The effective logging level of the library root logger (e.g., logging.WARNING).

set_verbosity

Name Type Required Description
verbosity int Yes Logging level constant (e.g., logging.DEBUG, logging.INFO).

Environment Variable

Variable Valid Values Description
DATASETS_VERBOSITY "debug", "info", "warning", "error", "critical" Overrides the default logging level at module initialization time. If set to an invalid value, a warning is logged and the default (WARNING) is used.

Usage Examples

Basic Logger Usage in a Dataset Builder

from datasets.utils.logging import get_logger

logger = get_logger(__name__)

logger.info("Processing dataset...")
logger.warning("Missing optional field 'description'")
logger.debug("Detailed debug info: %s", some_variable)

Setting Verbosity Programmatically

import datasets

# Show all info-level messages
datasets.logging.set_verbosity_info()

# Or use the generic setter with a level constant
datasets.logging.set_verbosity(datasets.logging.DEBUG)

# Check current level
level = datasets.logging.get_verbosity()
print(level)  # 10 (DEBUG)

Using the Environment Variable

# Set before importing datasets
import os
os.environ["DATASETS_VERBOSITY"] = "info"

import datasets
# Now all INFO and above messages will be displayed
ds = datasets.load_dataset("cornell-movie-review-data/rotten_tomatoes")

Controlling Log Propagation

import datasets

# Enable propagation if you have a custom root logger
datasets.logging.enable_propagation()

# Disable propagation to prevent double logging
datasets.logging.disable_propagation()

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment