Principle:Huggingface Optimum Backend Pipeline Dispatch
Overview
Strategy pattern for routing pipeline creation to backend-specific implementations based on the selected accelerator.
Description
Once the accelerator is determined (either explicitly by the user or via auto-detection), the pipeline() function dispatches to the appropriate backend-specific pipeline constructor. The dispatch mechanism uses lazy imports to avoid loading heavy backend dependencies until they are actually needed:
- For ONNX Runtime (
"ort"): lazy-importspipelinefromoptimum.onnxruntime - For OpenVINO (
"ov") or IPEX ("ipex"): lazy-importspipelinefromoptimum.intel
Each backend pipeline handles its own model loading, optimization, and inference execution while maintaining the standard transformers.Pipeline interface. This means the dispatch is transparent to the end user -- the returned pipeline object behaves identically regardless of which backend was selected.
Dispatch Architecture
The dispatch follows a two-tier structure:
| Tier | Package | Backends | Description |
|---|---|---|---|
| Tier 1 | optimum.onnxruntime
|
ONNX Runtime only | Dedicated package for ONNX Runtime integration |
| Tier 2 | optimum.intel
|
OpenVINO and IPEX | Unified Intel package handling both OpenVINO and IPEX. Receives additional accelerator parameter to distinguish between the two.
|
Note that both OpenVINO and IPEX share the same dispatch target (optimum.intel.pipeline) but are differentiated by the accelerator parameter passed to it.
Usage
This is an internal dispatch mechanism triggered automatically by the pipeline() function. Users do not interact with the dispatch logic directly. The dispatch is the second phase of the pipeline creation workflow:
- User calls
optimum.pipelines.pipeline()with an accelerator choice - The function detects/validates the backend (see Principle:Huggingface_Optimum_Backend_Availability_Detection)
- Dispatch occurs here: the function lazy-imports and calls the backend-specific pipeline constructor
- The backend-specific constructor returns a
transformers.Pipelineinstance
Theoretical Basis
Strategy pattern with lazy loading. Backend-specific implementations are loaded on demand to avoid importing heavy dependencies (such as ONNX Runtime, OpenVINO, or Intel Extension for PyTorch) until they are actually needed. This design has several benefits:
- Reduced startup time: Only the selected backend's dependencies are imported
- Graceful degradation: Users without a particular backend installed do not encounter import errors at module load time
- Decoupled packages: Each backend pipeline is a separate Python package (
optimum-onnxfor ONNX Runtime,optimum-intelfor OpenVINO and IPEX), allowing independent versioning and installation
The lazy import is implemented using standard Python from ... import statements inside the conditional branches of the dispatch function, ensuring that the import only executes when the corresponding branch is reached.
Parameter Forwarding
All standard transformers.pipeline() parameters are forwarded to the backend-specific constructor:
| Parameter | Forwarded To |
|---|---|
task, model, config
|
Both ort_pipeline and intel_pipeline
|
tokenizer, feature_extractor, image_processor, processor
|
Both |
framework, revision, use_fast, token
|
Both |
device, device_map, torch_dtype
|
Both |
trust_remote_code, model_kwargs, pipeline_class
|
Both |
accelerator
|
intel_pipeline only (to distinguish "ov" from "ipex")
|
**kwargs
|
Both |
Related
- implemented_by → Implementation:Huggingface_Optimum_Backend_Specific_Pipeline