Workflow:Kornia Kornia ONNX Model Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Model_Deployment, ONNX |
| Last Updated | 2026-02-09 15:00 GMT |
Overview
End-to-end process for chaining multiple ONNX models and operators into a single inference pipeline using Kornia's ONNXSequential for production deployment.
Description
This workflow covers the construction of multi-model ONNX inference pipelines using Kornia's ONNXSequential class. The system allows loading ONNX models from HuggingFace's kornia model hub, local files, or in-memory ModelProto objects, and chaining them into a single computational graph. The pipeline leverages ONNXRuntime for optimized execution across different hardware backends (CPU, CUDA, TensorRT, OpenVINO). Models can include both Kornia operators (exported as ONNX) and custom models, with flexible input/output mapping between stages. The combined pipeline can be exported as a single ONNX file for deployment.
Usage
Execute this workflow when you need to deploy a computer vision pipeline in production that chains multiple preprocessing operators and models together. This is ideal when moving from PyTorch research to production deployment, when you need hardware-agnostic inference (CPU, GPU, edge devices), or when combining Kornia operators with custom ONNX models in a single graph.
Execution Steps
Step 1: Install ONNX Dependencies
Ensure ONNX and ONNXRuntime are installed in the environment. For GPU inference, install onnxruntime-gpu instead of the standard CPU package. The specific CUDA version of onnxruntime-gpu must match the system's CUDA installation.
Key considerations:
- Install onnx and onnxruntime via pip
- For GPU: install onnxruntime-gpu with the correct CUDA version index
- Kornia's ONNX module uses lazy loading, so these are optional dependencies
- Verify installation by importing onnxruntime and checking available providers
Step 2: Select and Load ONNX Models
Identify the ONNX models and operators to chain together. Models can be loaded from Kornia's HuggingFace hub using the 'hf://' prefix, from local ONNX files, or passed as in-memory onnx.ModelProto objects. Kornia provides pre-exported operators for common operations (color conversion, resizing, flipping) and full models (RT-DETR detection).
Key considerations:
- HuggingFace models use format: 'hf://operators/kornia.module.Class' or 'hf://models/kornia.models.name'
- Models are cached locally after first download
- Local models are specified by file path: 'path/to/model.onnx'
- Each model should have clearly defined input and output node names
Step 3: Configure Input/Output Mapping
Define how outputs from one model feed into inputs of the next model in the chain. By default, ONNXSequential assumes each model has a single input named 'input' and single output named 'output'. For models with multiple or differently-named ports, provide explicit io_maps to specify the connections.
Key considerations:
- Default mapping works when each model has one input ('input') and one output ('output')
- For complex models, specify io_maps as a list of (output_name, input_name) tuples
- io_maps[0] maps outputs of model 1 to inputs of model 2
- Mismatched tensor shapes between connected ports will cause runtime errors
Step 4: Create the ONNXSequential Pipeline
Instantiate the ONNXSequential with the selected models, execution provider preferences, and IO mapping. The constructor combines model graphs, creates an optimized ONNXRuntime inference session, and prepares the pipeline for execution.
Key considerations:
- Pass providers list to select execution backend: CUDAExecutionProvider, CPUExecutionProvider, etc.
- Use auto_ir_version_conversion=True if models have incompatible IR/OPSET versions
- Session options can be customized for optimization level and memory management
- The combined graph is validated during construction
Step 5: Run Inference
Execute the pipeline by calling the ONNXSequential instance with NumPy array input data. The input should match the expected shape and dtype of the first model in the chain. Results are returned as NumPy arrays.
Key considerations:
- Input data must be NumPy arrays (not PyTorch tensors)
- Use as_cuda() / as_cpu() to switch execution providers dynamically
- First inference call may include warmup overhead
- Multiple inputs can be processed concurrently using async API
Step 6: Export Combined Model
Export the merged pipeline as a single ONNX file for deployment. This produces a self-contained ONNX model that encapsulates the entire chain of operations, suitable for serving with any ONNX-compatible runtime.
Key considerations:
- Export with onnx_seq.export('combined_model.onnx')
- The exported model can be loaded by any ONNXRuntime instance
- Metadata can be added before export using add_metadata()
- The exported model is independent of Kornia at inference time