Principle:Huggingface Optimum Optimized Model Loading

Overview

Abstract interface for loading pre-trained models into backend-specific optimized formats with optional on-the-fly export.

Description

OptimizedModel provides a unified from_pretrained interface that works across all acceleration backends (ONNX Runtime, OpenVINO, IPEX). It supports two loading modes:

Direct loading: Loading pre-exported optimized models directly from the Hub or local disk (when export=False, the default)
On-the-fly export: Loading vanilla HuggingFace models and converting them to the optimized format at load time (when export=True)

The abstract base class defines the contract that all backend-specific model classes must implement, ensuring a consistent loading and inference interface regardless of the underlying acceleration technology.

Loading Modes

Mode	Trigger	Method Called	Description
Direct load	`export=False` (default)	`cls._from_pretrained()`	Loads pre-exported optimized model artifacts (e.g., ONNX files, OpenVINO IR files)
On-the-fly export (legacy)	`export=True` + class has `_from_transformers`	`cls._from_transformers()`	Legacy export path for backward compatibility
On-the-fly export	`export=True` + class has `_export`	`cls._export()`	Modern export path that converts a vanilla HuggingFace model to the optimized format

Model Lifecycle

The OptimizedModel class manages the full model lifecycle:

Loading: from_pretrained() loads or exports the model
Inference: __call__() delegates to forward(), which is implemented by backend-specific subclasses
Saving: save_pretrained() persists the optimized model and its configuration
Sharing: push_to_hub() uploads the optimized model to the HuggingFace Hub

Usage

Use when loading models for accelerated inference, either from pre-exported artifacts or with on-the-fly conversion.

# Load a pre-exported ONNX model
from optimum.onnxruntime import ORTModelForSequenceClassification

model = ORTModelForSequenceClassification.from_pretrained("optimum/distilbert-base-uncased-finetuned-sst-2-english")

# Export a vanilla HuggingFace model to ONNX on-the-fly
model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased", export=True)

# Save the exported model for later reuse
model.save_pretrained("./my_ort_model")

Theoretical Basis

Abstract Factory pattern. The base class OptimizedModel defines the loading interface while concrete subclasses (e.g., ORTModel, OVModel) implement backend-specific loading logic. Key design elements:

Template Method: from_pretrained() implements the common loading algorithm (config resolution, file discovery, export decision) and delegates the actual loading to abstract methods (_from_pretrained, _export)
Backward compatibility: The export=True path first checks for _from_transformers (legacy) before falling back to _export (modern)
Configuration resolution: Handles multiple config sources (local directory, Hub, subfolder fallback) using AutoConfig
Library detection: Uses TasksManager.infer_library_from_model() to handle models from different libraries (transformers, timm, etc.)

Class Hierarchy

ABC (Abstract Base Class)
  +-- PreTrainedModel (Optimum-internal compatibility shim)
        +-- OptimizedModel (abstract base)
              +-- ORTModel (ONNX Runtime, from optimum-onnx)
              +-- OVModel (OpenVINO, from optimum-intel)
              +-- IPEXModel (IPEX, from optimum-intel)

Connections

Implementation:Huggingface_Optimum_OptimizedModel_From_Pretrained

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment