Implementation:Huggingface Datasets Tqdm Utils
Overview
Tqdm Utils provides a custom tqdm progress bar wrapper with global enable/disable control for the datasets library. The module wraps tqdm.auto.tqdm with a subclass that respects a global disabled state, and provides helper functions to programmatically enable or disable progress bars. An environment variable HF_DATASETS_DISABLE_PROGRESS_BARS can override programmatic settings and takes priority.
This module is part of the huggingface/datasets repository.
- Source file: src/datasets/utils/tqdm.py (135 lines)
- Domain: UI, Utilities
- Import:
from datasets.utils.tqdm import tqdm, enable_progress_bars, disable_progress_bars
Class: tqdm
A subclass of tqdm.auto.tqdm that overrides the disable argument when progress bars are globally disabled.
class tqdm(old_tqdm):
"""
Class to override `disable` argument in case progress bars are globally disabled.
Taken from https://github.com/tqdm/tqdm/issues/619#issuecomment-619639324.
"""
def __init__(self, *args, **kwargs):
if are_progress_bars_disabled():
kwargs["disable"] = True
elif kwargs.get("disable") is None and os.getenv("TQDM_POSITION") == "-1":
# Force-enable progress bars in cloud environments when disable=None
kwargs["disable"] = False
super().__init__(*args, **kwargs)
def __delattr__(self, attr: str) -> None:
"""Fix for https://github.com/huggingface/datasets/issues/6066"""
try:
super().__delattr__(attr)
except AttributeError:
if attr != "_lock":
raise
Behavior:
- If progress bars are globally disabled, the
disablekeyword argument is forced toTrue, regardless of what the caller passes. - If
disableisNoneand the environment variableTQDM_POSITIONis set to"-1", progress bars are force-enabled to support cloud environments. - The
__delattr__override silently ignoresAttributeErrorfor the_lockattribute, fixing a known issue (#6066).
Functions
disable_progress_bars
Disables progress bars globally for the datasets library. If the environment variable HF_DATASETS_DISABLE_PROGRESS_BARS is explicitly set to 0 (False), a warning is emitted and the function has no effect.
def disable_progress_bars() -> None:
if HF_DATASETS_DISABLE_PROGRESS_BARS is False:
warnings.warn(
"Cannot disable progress bars: environment variable "
"`HF_DATASETS_DISABLE_PROGRESS_BAR=0` is set and has priority."
)
return
global _hf_datasets_progress_bars_disabled
_hf_datasets_progress_bars_disabled = True
enable_progress_bars
Enables progress bars globally for the datasets library. If the environment variable HF_DATASETS_DISABLE_PROGRESS_BARS is explicitly set to 1 (True), a warning is emitted and the function has no effect.
def enable_progress_bars() -> None:
if HF_DATASETS_DISABLE_PROGRESS_BARS is True:
warnings.warn(
"Cannot enable progress bars: environment variable "
"`HF_DATASETS_DISABLE_PROGRESS_BAR=1` is set and has priority."
)
return
global _hf_datasets_progress_bars_disabled
_hf_datasets_progress_bars_disabled = False
are_progress_bars_disabled
Returns whether progress bars are globally disabled. Returns True if disabled, False otherwise.
def are_progress_bars_disabled() -> bool:
global _hf_datasets_progress_bars_disabled
return _hf_datasets_progress_bars_disabled
is_progress_bar_enabled
Returns whether progress bars are globally enabled. This is the logical inverse of are_progress_bars_disabled().
def is_progress_bar_enabled():
return not are_progress_bars_disabled()
Backward Compatibility Aliases
The module provides backward-compatible singular aliases:
enable_progress_bar = enable_progress_bars
disable_progress_bar = disable_progress_bars
Global State
The module uses a module-level boolean _hf_datasets_progress_bars_disabled to track the global state. It is initialized from the HF_DATASETS_DISABLE_PROGRESS_BARS config value, defaulting to False (progress bars enabled).
Priority rules:
- The environment variable
HF_DATASETS_DISABLE_PROGRESS_BARShas priority over programmatic control. - If the environment variable is set, calls to
enable_progress_bars()ordisable_progress_bars()that conflict with the environment variable will emit a warning and have no effect. - If the environment variable is not set (
None), programmatic control works as expected.
Dependencies
| Dependency | Type | Purpose |
|---|---|---|
os |
Standard library | Reading the TQDM_POSITION environment variable
|
warnings |
Standard library | Emitting warnings when environment variable conflicts with programmatic control |
tqdm.auto |
External | Base progress bar class |
datasets.config |
Internal | Reading HF_DATASETS_DISABLE_PROGRESS_BARS configuration
|
Usage Example
from datasets.utils.tqdm import tqdm, disable_progress_bars, enable_progress_bars, are_progress_bars_disabled
# Disable progress bars globally
disable_progress_bars()
print(are_progress_bars_disabled()) # True
# Progress bar will not be shown even with disable=False
for _ in tqdm(range(5), disable=False):
pass
# Re-enable progress bars
enable_progress_bars()
print(are_progress_bars_disabled()) # False
# Progress bar will now be shown
for _ in tqdm(range(5)):
pass
Design Notes
- The
tqdmsubclass approach ensures that all progress bars created through this module respect the global disable state, without requiring every call site to check the state manually. - The environment variable override provides a way for deployment environments to control progress bar behavior without code changes, which is useful in CI/CD pipelines and non-interactive contexts.
- The
TQDM_POSITION="-1"check is a heuristic for cloud environments where tqdm auto-detection might incorrectly disable progress bars. - The
__delattr__fix for the_lockattribute addresses a race condition or cleanup issue specific to the interaction between tqdm and the datasets library.
File Location
- Repository: huggingface/datasets
- Full path: src/datasets/utils/tqdm.py