Principle:Microsoft Onnxruntime Result Processing
Metadata
| Field | Value |
|---|---|
| Principle Name | Result_Processing |
| Repository | Microsoft_Onnxruntime |
| Source Repository | https://github.com/microsoft/onnxruntime |
| Domain | ML_Inference, Model_Optimization |
| Last Updated | 2026-02-10 |
| Workflow | Python_Inference_Pipeline |
| Pair | 6 of 6 |
Overview
Extraction and interpretation of inference output tensors for downstream use.
Description
After inference execution, results are returned as a list of numpy arrays. Each array corresponds to one of the requested outputs and can be indexed, sliced, or transformed using standard numpy operations for downstream tasks like classification, regression, or visualization. This is a Pattern Doc describing common patterns for handling ONNX Runtime output.
The return value of session.run() is a Python list where:
- The list index corresponds to the position of the output name in the
output_namesargument. - Each element is a
numpy.ndarraywith shape and dtype matching the model's declared output.
Common post-processing patterns include:
- Classification -- Using
numpy.argmax()to extract predicted class labels from probability distributions. - Regression -- Direct use of the output array as predicted values.
- Probability extraction -- Accessing probability vectors for confidence scoring and ROC analysis.
- Batch processing -- Slicing output arrays along the batch dimension to process individual samples.
The pattern is demonstrated at docs/python/examples/plot_load_and_predict.py:L55.
Theoretical Basis
ONNX Runtime returns outputs as numpy arrays to maintain compatibility with the broader Python scientific computing ecosystem. This design choice enables seamless integration with downstream libraries such as scikit-learn (for metrics), matplotlib (for visualization), and pandas (for tabular analysis).
The output list ordering is deterministic and matches the order of requested output names. When None is passed for output names, outputs are returned in the order they appear in the model's graph definition.
Post-processing operations are performed entirely in Python/numpy space and do not involve the ONNX Runtime execution engine. This clean separation between inference and post-processing allows for flexible output handling without runtime overhead.
Usage
Results are accessed by list indexing and processed with numpy operations:
results = sess.run([output_name], {input_name: x})
predictions = results[0] # First output tensor
# For classification:
predicted_class = numpy.argmax(predictions, axis=-1)