Principle:Kornia Kornia ONNX Inference
| Knowledge Sources | |
|---|---|
| Domains | ONNX, Deployment, Inference |
| Last Updated | 2026-02-09 15:00 GMT |
Overview
Technique of executing ONNX model inference through an optimized runtime session with numpy array inputs and outputs.
Description
ONNX inference runs the computation graph defined by an ONNX model using an optimized runtime (ONNX Runtime). The runtime selects execution providers (CPU, CUDA, TensorRT) for optimal hardware utilization. Inputs are provided as numpy arrays matching the model's expected shapes and dtypes. The runtime executes the graph, applying provider-specific optimizations (operator fusion, memory planning), and returns output numpy arrays.
This separates model training (PyTorch) from inference (ONNX Runtime) for production deployment.
Usage
Use after constructing an ONNXSequential pipeline to run inference on input data. Convert PyTorch tensors to numpy arrays before passing to the pipeline.
Theoretical Basis
ONNX Runtime inference:
session.run(output_names, {input_name: input_data})
The runtime builds an execution plan mapping each graph node to the best available execution provider. Provider priority:
CUDAExecutionProvider > CPUExecutionProvider