Workflow:Onnx Onnx Reference Evaluation

Knowledge Sources	ONNX Reference Evaluator Docs ONNX Operators Reference
Domains	ML_Infrastructure, Model_Testing
Last Updated	2026-02-10 02:30 GMT

Overview

End-to-end process for executing an ONNX model using the pure Python reference evaluator to produce inference outputs for testing and validation.

Description

This workflow covers the procedure for running ONNX models through the ReferenceEvaluator, a pure Python implementation of the ONNX operator specifications. The reference evaluator faithfully implements each ONNX operator according to its mathematical specification, making it the ground-truth reference for operator behavior. It supports standard ONNX operators, ML operators, custom operator extensions, and optimized operator variants. The evaluator is primarily used for testing, debugging, and validating operator implementations rather than production inference.

Usage

Execute this workflow when you need to:

Validate that a model produces correct outputs without depending on a hardware-specific runtime
Debug operator-level behavior by enabling verbose mode to inspect intermediate tensor values
Test custom operator implementations against the ONNX specification
Generate reference outputs for backend conformance testing
Verify model behavior after transformations (version conversion, composition, optimization)

Execution Steps

Step 1: Load or Construct the Model

Obtain the ONNX model to evaluate. The reference evaluator accepts a ModelProto object, a file path string, raw bytes, or individual proto objects (GraphProto, FunctionProto, NodeProto) for targeted testing.

Key considerations:

The model should be a valid, checker-compliant ONNX model
For single-operator testing, a NodeProto can be passed directly without constructing a full model
File paths are resolved and loaded automatically by the evaluator

Step 2: Initialize the Reference Evaluator

Create a ReferenceEvaluator instance, configuring verbosity level, custom operator implementations, and optimization preferences. The evaluator parses the model graph and maps each node to its corresponding Python operator implementation.

Key considerations:

The verbose parameter (0-10) controls the level of intermediate output displayed during execution
Custom operators can be provided via the new_ops parameter as classes inheriting from OpRun
The optimized parameter enables faster implementations for certain operators (e.g., Conv via im2col + Gemm decomposition)
The evaluator resolves operator implementations by domain, op_type, and opset version

Step 3: Prepare Input Data

Construct the input feed dictionary mapping input tensor names to NumPy arrays. The arrays must match the types and shapes declared in the model's input specifications.

Key considerations:

Input names must exactly match the model's graph input names
Array data types must be compatible with the declared ONNX tensor element types
Dynamic dimensions can take any valid size, but all inputs must be mutually consistent in shared dimensions
Use the numpy_helper module to convert between ONNX TensorProto and NumPy array formats if needed

Step 4: Run Inference

Execute the model by calling the evaluator's run method with the output names (or None for all outputs) and the input feed dictionary. The evaluator traverses the graph in topological order, computing each node's outputs from its inputs.

Key considerations:

Passing None as the first argument returns all model outputs
Passing a list of specific output names returns only those outputs
The evaluator processes subgraphs (If, Loop, Scan operators) recursively
Results are returned as a list of NumPy arrays in the same order as the requested outputs

Step 5: Validate the Results

Compare the evaluator's outputs against expected reference values. This typically involves element-wise comparison with appropriate tolerance for floating-point arithmetic.

Key considerations:

Use appropriate numerical tolerances (absolute and relative) when comparing floating-point outputs
The reference evaluator's outputs serve as the ground truth for ONNX operator semantics
Differences from hardware-specific runtimes may arise from floating-point ordering, precision, or implementation-specific behavior within spec bounds

Execution Diagram

GitHub URL

Workflow Repository