Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Microsoft Onnxruntime Nodejs Inference Execution

From Leeroopedia


Field Value
Principle Name Nodejs_Inference_Execution
Overview Asynchronous execution of ONNX model inference with a feed dictionary of named input tensors in Node.js.
Category API Doc
Domains ML_Inference, JavaScript_Integration
Source Repository microsoft/onnxruntime
Last Updated 2026-02-10

Overview

Asynchronous execution of ONNX model inference with a feed dictionary of named input tensors in Node.js. The session.run() method is the core inference operation, accepting named input tensors and returning named output tensors.

Description

The session.run() method accepts a feeds object mapping input names to ort.Tensor objects and returns a Promise resolving to a results object mapping output names to ort.Tensor objects. This is the core inference operation in the Node.js API.

The feeds object uses the model's input names as keys. These names correspond to the input node names defined in the ONNX model graph. If an input is missing or has the wrong shape/type, the runtime will throw an error.

The method is asynchronous, returning a Promise that resolves when the inference computation completes. The actual computation is performed by the native ONNX Runtime C++ engine on a thread pool, so the Node.js event loop is not blocked during inference.

Key characteristics of the run operation:

  • Named inputs: The feeds object keys must match the model's input names exactly.
  • Named outputs: The results object keys correspond to the model's output names.
  • Async execution: The native inference runs on background threads, returning a Promise.
  • Type safety: Input tensor types and shapes must match the model's expected inputs.
  • Single invocation: Each call to run() executes one complete forward pass through the model graph.

Theoretical Basis

Feed-based execution follows the same pattern as TensorFlow's session.run and ONNX Runtime's Python API, providing named tensor inputs and outputs for graph execution. This pattern treats the model as a computational graph where:

  • Inputs are "fed" into the graph via named entry points.
  • Outputs are "fetched" from named exit points of the graph.
  • The runtime determines the execution order based on the graph topology, executing only the subgraph necessary to compute the requested outputs from the provided inputs.

This named feed/fetch pattern is fundamental to ONNX model execution because ONNX models can have multiple inputs and multiple outputs, and the names provide the mapping between user data and graph nodes. It also supports partial graph execution when only a subset of outputs is requested.

Usage

The inference execution step requires a previously created inference session and properly constructed input tensors:

  1. Create an InferenceSession from an ONNX model.
  2. Construct ort.Tensor objects for each model input.
  3. Build a feeds object mapping input names to tensors.
  4. Call session.run(feeds) and await the result.
  5. Extract output tensors from the results object by name.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment