Implementation:Huggingface Diffusers Tests Fetcher
| Knowledge Sources | |
|---|---|
| Domains | CI, Testing, Dependency_Analysis |
| Last Updated | 2026-02-13 21:00 GMT |
Overview
Concrete tool for selectively determining which tests to run on a pull request by analyzing the import dependency graph between modified files and test files in the diffusers repository.
Description
The tests_fetcher.py utility (V2) implements a two-stage approach to selective test execution. Stage 1 identifies modified files by computing the git diff between the PR branch and its base (or between the last two commits on main). It filters out changes that only affect docstrings or comments. Stage 2 builds a reverse dependency map by analyzing Python imports across all source and test files: if module A imports module B, then changing B means tests for A should run. The script recursively follows this dependency chain to produce a minimal test list. When too many pipelines are affected, it falls back to testing only "core" pipelines (ControlNet, Stable Diffusion, SDXL, SVD, etc.). It also supports commit message flags: `[skip ci]` to skip, `[test all]` to run everything, and `[no filter]` to disable pipeline filtering.
Usage
Run this script in CI to generate the selective test list for a PR. On the main branch, it automatically diffs against the last commit. The output is a text file listing test paths and a JSON map categorizing tests by type. It is the core mechanism that keeps CI fast by avoiding running all 400+ test files on every PR.
Code Reference
Source Location
- Repository: Huggingface_Diffusers
- File: utils/tests_fetcher.py
- Lines: 1-1128
Signature
@contextmanager
def checkout_commit(repo: Repo, commit_id: str):
"""Context manager that checks out a given commit and restores on exit."""
...
def get_diff(repo: Repo, base_commit: str, commits: list[str]) -> list[str]:
"""Get the list of modified files between commits."""
...
def get_module_dependencies(module_fname: str) -> list[str]:
"""Get all modules imported by a given module file."""
...
def create_reverse_dependency_tree() -> dict[str, list[str]]:
"""Build a map from each module to all modules that depend on it."""
...
def infer_tests_to_run(
output_file: str,
diff_with_last_commit: bool = False,
json_output_file: str | None = None,
):
"""Main function: identify modified files and compute impacted tests."""
...
def filter_tests(output_file: str, filters: list[str]):
"""Filter specific test categories from the output file."""
...
def parse_commit_message(commit_message: str) -> dict:
"""Parse commit flags: [skip ci], [test all], [no filter]."""
...
def print_tree_deps_of(module_fname: str):
"""Print the dependency tree for a specific module (debug tool)."""
...
def get_all_tests() -> list[str]:
"""Get all test files in the repository."""
...
def create_json_map(test_files: list[str], json_output_file: str):
"""Create a JSON mapping of test categories to test files."""
...
def update_test_map_with_core_pipelines(json_output_file: str):
"""Ensure core pipeline tests are always included."""
...
Import
# CLI script — not imported as a module:
# python utils/tests_fetcher.py
# python utils/tests_fetcher.py --diff_with_last_commit
# python utils/tests_fetcher.py --print_dependencies_of src/diffusers/models/unet_2d.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| output_file | str | No | Path to write the test list (default: `test_list.txt`) |
| json_output_file | str | No | Path to write the test map JSON (default: `test_map.json`) |
| diff_with_last_commit | bool | No | Diff against last commit instead of PR base |
| filter_tests | bool | No | Filter pipeline/repo_utils tests from the list |
| print_dependencies_of | str | No | Print dependency tree for a specific file (debug mode) |
| commit_message | str | No | Commit message to parse for CI flags |
Outputs
| Name | Type | Description |
|---|---|---|
| test_list.txt | File | Newline-separated list of test file paths to run |
| test_map.json | File | JSON dict mapping test categories to lists of test files |
| examples_test_list.txt | File | List of example tests to run (when `[test all]` is set) |
Usage Examples
PR Test Selection
# Standard PR usage — detects branch and computes diff automatically
python utils/tests_fetcher.py
# Output files:
# test_list.txt — flat list of tests
# test_map.json — categorized test map
Main Branch Usage
# On main branch, diff against last commit
python utils/tests_fetcher.py --diff_with_last_commit
Debug Dependency Tree
# Show which tests depend on a specific module
python utils/tests_fetcher.py --print_dependencies_of src/diffusers/models/unets/unet_2d.py