Principle:Huggingface Optimum Backend Pipeline Dispatch

Overview

Strategy pattern for routing pipeline creation to backend-specific implementations based on the selected accelerator.

Description

Once the accelerator is determined (either explicitly by the user or via auto-detection), the pipeline() function dispatches to the appropriate backend-specific pipeline constructor. The dispatch mechanism uses lazy imports to avoid loading heavy backend dependencies until they are actually needed:

For ONNX Runtime ("ort"): lazy-imports pipeline from optimum.onnxruntime
For OpenVINO ("ov") or IPEX ("ipex"): lazy-imports pipeline from optimum.intel

Each backend pipeline handles its own model loading, optimization, and inference execution while maintaining the standard transformers.Pipeline interface. This means the dispatch is transparent to the end user -- the returned pipeline object behaves identically regardless of which backend was selected.

Dispatch Architecture

The dispatch follows a two-tier structure:

Tier	Package	Backends	Description
Tier 1	`optimum.onnxruntime`	ONNX Runtime only	Dedicated package for ONNX Runtime integration
Tier 2	`optimum.intel`	OpenVINO and IPEX	Unified Intel package handling both OpenVINO and IPEX. Receives additional `accelerator` parameter to distinguish between the two.

Note that both OpenVINO and IPEX share the same dispatch target (optimum.intel.pipeline) but are differentiated by the accelerator parameter passed to it.

Usage

This is an internal dispatch mechanism triggered automatically by the pipeline() function. Users do not interact with the dispatch logic directly. The dispatch is the second phase of the pipeline creation workflow:

User calls optimum.pipelines.pipeline() with an accelerator choice
The function detects/validates the backend (see Principle:Huggingface_Optimum_Backend_Availability_Detection)
Dispatch occurs here: the function lazy-imports and calls the backend-specific pipeline constructor
The backend-specific constructor returns a transformers.Pipeline instance

Theoretical Basis

Strategy pattern with lazy loading. Backend-specific implementations are loaded on demand to avoid importing heavy dependencies (such as ONNX Runtime, OpenVINO, or Intel Extension for PyTorch) until they are actually needed. This design has several benefits:

Reduced startup time: Only the selected backend's dependencies are imported
Graceful degradation: Users without a particular backend installed do not encounter import errors at module load time
Decoupled packages: Each backend pipeline is a separate Python package (optimum-onnx for ONNX Runtime, optimum-intel for OpenVINO and IPEX), allowing independent versioning and installation

The lazy import is implemented using standard Python from ... import statements inside the conditional branches of the dispatch function, ensuring that the import only executes when the corresponding branch is reached.

Parameter Forwarding

All standard transformers.pipeline() parameters are forwarded to the backend-specific constructor:

Parameter	Forwarded To
`task`, `model`, `config`	Both `ort_pipeline` and `intel_pipeline`
`tokenizer`, `feature_extractor`, `image_processor`, `processor`	Both
`framework`, `revision`, `use_fast`, `token`	Both
`device`, `device_map`, `torch_dtype`	Both
`trust_remote_code`, `model_kwargs`, `pipeline_class`	Both
`accelerator`	`intel_pipeline` only (to distinguish `"ov"` from `"ipex"`)
`**kwargs`	Both

Connections

Implementation:Huggingface_Optimum_Backend_Specific_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment