Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Fetch Hub Objects For CI

From Leeroopedia
Knowledge Sources
Domains CI_CD, Testing_Infrastructure
Last Updated 2026-02-13 20:00 GMT

Overview

Concrete tool for pre-downloading all external test data (images, audio, video, datasets, tokenizer files) needed by the CI test suite.

Description

The fetch_hub_objects_for_ci.py utility ensures test reliability by pre-caching all external dependencies before test execution. It maintains a hardcoded list of URLs for test data (COCO images, HuggingFace Hub datasets, audio/video samples). For HuggingFace URLs, it parses the URL pattern and uses hf_hub_download for authenticated downloads. For external URLs, it uses httpx streaming downloads with content validation (checking file headers for HTML error pages and minimum file size). Also pre-downloads specific datasets, model files, and tokenizers based on CI flags.

Usage

Run at the start of CI test jobs to pre-cache test data, preventing network-dependent flaky test failures.

Code Reference

Source Location

Signature

def url_to_local_path(url: str) -> str:
    """Convert a URL to a local cache path."""

def parse_hf_url(url: str) -> Tuple[str, str, str]:
    """Parse a HuggingFace Hub URL into (repo_id, filename, revision)."""

def validate_downloaded_content(filepath: str) -> bool:
    """Check downloaded file is valid (not HTML error page, meets min size)."""

def download_test_file(url: str, target_dir: str) -> str:
    """Download a test file with validation and caching."""

Import

python utils/fetch_hub_objects_for_ci.py

I/O Contract

Inputs

Name Type Required Description
Hardcoded URL list List[str] Yes URLs embedded in the script
HF_TOKEN env var No HuggingFace token for authenticated downloads

Outputs

Name Type Description
Cached files Files Downloaded test data in local cache directory

Usage Examples

Pre-caching Test Data

# Run before test execution in CI
python utils/fetch_hub_objects_for_ci.py

# Typically called in CI pipeline setup step

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment