Principle:Huggingface Optimum Optimized Model Loading
Overview
Abstract interface for loading pre-trained models into backend-specific optimized formats with optional on-the-fly export.
Description
OptimizedModel provides a unified from_pretrained interface that works across all acceleration backends (ONNX Runtime, OpenVINO, IPEX). It supports two loading modes:
- Direct loading: Loading pre-exported optimized models directly from the Hub or local disk (when
export=False, the default) - On-the-fly export: Loading vanilla HuggingFace models and converting them to the optimized format at load time (when
export=True)
The abstract base class defines the contract that all backend-specific model classes must implement, ensuring a consistent loading and inference interface regardless of the underlying acceleration technology.
Loading Modes
| Mode | Trigger | Method Called | Description |
|---|---|---|---|
| Direct load | export=False (default)
|
cls._from_pretrained()
|
Loads pre-exported optimized model artifacts (e.g., ONNX files, OpenVINO IR files) |
| On-the-fly export (legacy) | export=True + class has _from_transformers
|
cls._from_transformers()
|
Legacy export path for backward compatibility |
| On-the-fly export | export=True + class has _export
|
cls._export()
|
Modern export path that converts a vanilla HuggingFace model to the optimized format |
Model Lifecycle
The OptimizedModel class manages the full model lifecycle:
- Loading:
from_pretrained()loads or exports the model - Inference:
__call__()delegates toforward(), which is implemented by backend-specific subclasses - Saving:
save_pretrained()persists the optimized model and its configuration - Sharing:
push_to_hub()uploads the optimized model to the HuggingFace Hub
Usage
Use when loading models for accelerated inference, either from pre-exported artifacts or with on-the-fly conversion.
# Load a pre-exported ONNX model
from optimum.onnxruntime import ORTModelForSequenceClassification
model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")
# Export a vanilla HuggingFace model to ONNX on-the-fly
model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased", export=True)
# Save the exported model for later reuse
model.save_pretrained("./my_ort_model")
Theoretical Basis
Abstract Factory pattern. The base class OptimizedModel defines the loading interface while concrete subclasses (e.g., ORTModel, OVModel) implement backend-specific loading logic. Key design elements:
- Template Method:
from_pretrained()implements the common loading algorithm (config resolution, file discovery, export decision) and delegates the actual loading to abstract methods (_from_pretrained,_export) - Backward compatibility: The
export=Truepath first checks for_from_transformers(legacy) before falling back to_export(modern) - Configuration resolution: Handles multiple config sources (local directory, Hub, subfolder fallback) using
AutoConfig - Library detection: Uses
TasksManager.infer_library_from_model()to handle models from different libraries (transformers, timm, etc.)
Class Hierarchy
ABC (Abstract Base Class)
+-- PreTrainedModel (Optimum-internal compatibility shim)
+-- OptimizedModel (abstract base)
+-- ORTModel (ONNX Runtime, from optimum-onnx)
+-- OVModel (OpenVINO, from optimum-intel)
+-- IPEXModel (IPEX, from optimum-intel)
Related
- implemented_by → Implementation:Huggingface_Optimum_OptimizedModel_From_Pretrained