Workflow:Onnx Onnx Reference Evaluation
| Knowledge Sources | |
|---|---|
| Domains | ML_Infrastructure, Model_Testing |
| Last Updated | 2026-02-10 02:30 GMT |
Overview
End-to-end process for executing an ONNX model using the pure Python reference evaluator to produce inference outputs for testing and validation.
Description
This workflow covers the procedure for running ONNX models through the ReferenceEvaluator, a pure Python implementation of the ONNX operator specifications. The reference evaluator faithfully implements each ONNX operator according to its mathematical specification, making it the ground-truth reference for operator behavior. It supports standard ONNX operators, ML operators, custom operator extensions, and optimized operator variants. The evaluator is primarily used for testing, debugging, and validating operator implementations rather than production inference.
Usage
Execute this workflow when you need to:
- Validate that a model produces correct outputs without depending on a hardware-specific runtime
- Debug operator-level behavior by enabling verbose mode to inspect intermediate tensor values
- Test custom operator implementations against the ONNX specification
- Generate reference outputs for backend conformance testing
- Verify model behavior after transformations (version conversion, composition, optimization)
Execution Steps
Step 1: Load or Construct the Model
Obtain the ONNX model to evaluate. The reference evaluator accepts a ModelProto object, a file path string, raw bytes, or individual proto objects (GraphProto, FunctionProto, NodeProto) for targeted testing.
Key considerations:
- The model should be a valid, checker-compliant ONNX model
- For single-operator testing, a NodeProto can be passed directly without constructing a full model
- File paths are resolved and loaded automatically by the evaluator
Step 2: Initialize the Reference Evaluator
Create a ReferenceEvaluator instance, configuring verbosity level, custom operator implementations, and optimization preferences. The evaluator parses the model graph and maps each node to its corresponding Python operator implementation.
Key considerations:
- The verbose parameter (0-10) controls the level of intermediate output displayed during execution
- Custom operators can be provided via the new_ops parameter as classes inheriting from OpRun
- The optimized parameter enables faster implementations for certain operators (e.g., Conv via im2col + Gemm decomposition)
- The evaluator resolves operator implementations by domain, op_type, and opset version
Step 3: Prepare Input Data
Construct the input feed dictionary mapping input tensor names to NumPy arrays. The arrays must match the types and shapes declared in the model's input specifications.
Key considerations:
- Input names must exactly match the model's graph input names
- Array data types must be compatible with the declared ONNX tensor element types
- Dynamic dimensions can take any valid size, but all inputs must be mutually consistent in shared dimensions
- Use the numpy_helper module to convert between ONNX TensorProto and NumPy array formats if needed
Step 4: Run Inference
Execute the model by calling the evaluator's run method with the output names (or None for all outputs) and the input feed dictionary. The evaluator traverses the graph in topological order, computing each node's outputs from its inputs.
Key considerations:
- Passing None as the first argument returns all model outputs
- Passing a list of specific output names returns only those outputs
- The evaluator processes subgraphs (If, Loop, Scan operators) recursively
- Results are returned as a list of NumPy arrays in the same order as the requested outputs
Step 5: Validate the Results
Compare the evaluator's outputs against expected reference values. This typically involves element-wise comparison with appropriate tolerance for floating-point arithmetic.
Key considerations:
- Use appropriate numerical tolerances (absolute and relative) when comparing floating-point outputs
- The reference evaluator's outputs serve as the ground truth for ONNX operator semantics
- Differences from hardware-specific runtimes may arise from floating-point ordering, precision, or implementation-specific behavior within spec bounds