Workflow:Kornia Kornia ONNX Model Pipeline

Knowledge Sources	Kornia ONNX Sequential Docs
Domains	Computer_Vision, Model_Deployment, ONNX
Last Updated	2026-02-09 15:00 GMT

Overview

End-to-end process for chaining multiple ONNX models and operators into a single inference pipeline using Kornia's ONNXSequential for production deployment.

Description

This workflow covers the construction of multi-model ONNX inference pipelines using Kornia's ONNXSequential class. The system allows loading ONNX models from HuggingFace's kornia model hub, local files, or in-memory ModelProto objects, and chaining them into a single computational graph. The pipeline leverages ONNXRuntime for optimized execution across different hardware backends (CPU, CUDA, TensorRT, OpenVINO). Models can include both Kornia operators (exported as ONNX) and custom models, with flexible input/output mapping between stages. The combined pipeline can be exported as a single ONNX file for deployment.

Usage

Execute this workflow when you need to deploy a computer vision pipeline in production that chains multiple preprocessing operators and models together. This is ideal when moving from PyTorch research to production deployment, when you need hardware-agnostic inference (CPU, GPU, edge devices), or when combining Kornia operators with custom ONNX models in a single graph.

Execution Steps

Step 1: Install ONNX Dependencies

Ensure ONNX and ONNXRuntime are installed in the environment. For GPU inference, install onnxruntime-gpu instead of the standard CPU package. The specific CUDA version of onnxruntime-gpu must match the system's CUDA installation.

Key considerations:

Install onnx and onnxruntime via pip
For GPU: install onnxruntime-gpu with the correct CUDA version index
Kornia's ONNX module uses lazy loading, so these are optional dependencies
Verify installation by importing onnxruntime and checking available providers

Step 2: Select and Load ONNX Models

Identify the ONNX models and operators to chain together. Models can be loaded from Kornia's HuggingFace hub using the 'hf://' prefix, from local ONNX files, or passed as in-memory onnx.ModelProto objects. Kornia provides pre-exported operators for common operations (color conversion, resizing, flipping) and full models (RT-DETR detection).

Key considerations:

HuggingFace models use format: 'hf://operators/kornia.module.Class' or 'hf://models/kornia.models.name'
Models are cached locally after first download
Local models are specified by file path: 'path/to/model.onnx'
Each model should have clearly defined input and output node names

Step 3: Configure Input/Output Mapping

Define how outputs from one model feed into inputs of the next model in the chain. By default, ONNXSequential assumes each model has a single input named 'input' and single output named 'output'. For models with multiple or differently-named ports, provide explicit io_maps to specify the connections.

Key considerations:

Default mapping works when each model has one input ('input') and one output ('output')
For complex models, specify io_maps as a list of (output_name, input_name) tuples
io_maps[0] maps outputs of model 1 to inputs of model 2
Mismatched tensor shapes between connected ports will cause runtime errors

Step 4: Create the ONNXSequential Pipeline

Instantiate the ONNXSequential with the selected models, execution provider preferences, and IO mapping. The constructor combines model graphs, creates an optimized ONNXRuntime inference session, and prepares the pipeline for execution.

Key considerations:

Pass providers list to select execution backend: CUDAExecutionProvider, CPUExecutionProvider, etc.
Use auto_ir_version_conversion=True if models have incompatible IR/OPSET versions
Session options can be customized for optimization level and memory management
The combined graph is validated during construction

Step 5: Run Inference

Execute the pipeline by calling the ONNXSequential instance with NumPy array input data. The input should match the expected shape and dtype of the first model in the chain. Results are returned as NumPy arrays.

Key considerations:

Input data must be NumPy arrays (not PyTorch tensors)
Use as_cuda() / as_cpu() to switch execution providers dynamically
First inference call may include warmup overhead
Multiple inputs can be processed concurrently using async API

Step 6: Export Combined Model

Export the merged pipeline as a single ONNX file for deployment. This produces a self-contained ONNX model that encapsulates the entire chain of operations, suitable for serving with any ONNX-compatible runtime.

Key considerations:

Export with onnx_seq.export('combined_model.onnx')
The exported model can be loaded by any ONNXRuntime instance
Metadata can be added before export using add_metadata()
The exported model is independent of Kornia at inference time

Execution Diagram

GitHub URL

Workflow Repository