Implementation:Huggingface Transformers Models To Deprecate
| Knowledge Sources | |
|---|---|
| Domains | Repository_Maintenance, Model_Lifecycle |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for identifying candidate models for deprecation by analyzing HuggingFace Hub download counts and the age of their first commit.
Description
The models_to_deprecate.py utility scans src/transformers/models/ for all non-deprecated modeling files, builds a dictionary of model metadata including the first commit date (via git log) and download counts (via the HuggingFace Hub API). It handles model name mismatches between folder names and Hub tags via MODEL_FOLDER_NAME_TO_TAG_MAPPING. Filters out models added within the last year, then queries the Hub for download counts across all relevant tags for each model (with early termination once a threshold is exceeded). Reports models below the download threshold (default 5,000), sorted by download count. Supports caching model info to JSON and configurable thresholds.
Usage
Run periodically to identify rarely-used models that are candidates for deprecation, feeding results into deprecate_models.py.
Code Reference
Source Location
- Repository: Huggingface_Transformers
- File: utils/models_to_deprecate.py
- Lines: 1-338
Signature
class HubModelLister:
"""Queries HuggingFace Hub API for model download statistics."""
def get_download_count(
self,
model_tags: List[str],
threshold: int = 5000,
) -> int:
"""Get total downloads for a model across all its Hub tags."""
def get_list_of_repo_model_paths() -> Dict[str, str]:
"""Scan repo for all non-deprecated model directories."""
def get_list_of_models_to_deprecate(
threshold: int = 5000,
min_age_days: int = 365,
) -> List[Dict]:
"""Identify models below the download threshold and older than min_age."""
Import
python utils/models_to_deprecate.py --threshold 5000
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| src/transformers/models/ | Directory | Yes | Model source directories to analyze |
| --threshold | int | No | Download count threshold (default: 5000) |
| --save_model_info | str | No | Path to cache model info JSON |
Outputs
| Name | Type | Description |
|---|---|---|
| Deprecation candidates | stdout | Models below threshold, sorted by downloads |
| model_info.json | JSON | Cached model metadata (if --save_model_info) |
Usage Examples
Finding Deprecation Candidates
# Find models with fewer than 5000 downloads
python utils/models_to_deprecate.py --threshold 5000
# Cache results for later use
python utils/models_to_deprecate.py --threshold 5000 \
--save_model_info model_info.json