Principle:Microsoft Onnxruntime Nodejs Inference Execution

Field	Value
Principle Name	Nodejs_Inference_Execution
Overview	Asynchronous execution of ONNX model inference with a feed dictionary of named input tensors in Node.js.
Category	API Doc
Domains	ML_Inference, JavaScript_Integration
Source Repository	microsoft/onnxruntime
Last Updated	2026-02-10

Overview

Asynchronous execution of ONNX model inference with a feed dictionary of named input tensors in Node.js. The session.run() method is the core inference operation, accepting named input tensors and returning named output tensors.

Description

The session.run() method accepts a feeds object mapping input names to ort.Tensor objects and returns a Promise resolving to a results object mapping output names to ort.Tensor objects. This is the core inference operation in the Node.js API.

The feeds object uses the model's input names as keys. These names correspond to the input node names defined in the ONNX model graph. If an input is missing or has the wrong shape/type, the runtime will throw an error.

The method is asynchronous, returning a Promise that resolves when the inference computation completes. The actual computation is performed by the native ONNX Runtime C++ engine on a thread pool, so the Node.js event loop is not blocked during inference.

Key characteristics of the run operation:

Named inputs: The feeds object keys must match the model's input names exactly.
Named outputs: The results object keys correspond to the model's output names.
Async execution: The native inference runs on background threads, returning a Promise.
Type safety: Input tensor types and shapes must match the model's expected inputs.
Single invocation: Each call to run() executes one complete forward pass through the model graph.

Theoretical Basis

Feed-based execution follows the same pattern as TensorFlow's session.run and ONNX Runtime's Python API, providing named tensor inputs and outputs for graph execution. This pattern treats the model as a computational graph where:

Inputs are "fed" into the graph via named entry points.
Outputs are "fetched" from named exit points of the graph.
The runtime determines the execution order based on the graph topology, executing only the subgraph necessary to compute the requested outputs from the provided inputs.

This named feed/fetch pattern is fundamental to ONNX model execution because ONNX models can have multiple inputs and multiple outputs, and the names provide the mapping between user data and graph nodes. It also supports partial graph execution when only a subset of outputs is requested.

Usage

The inference execution step requires a previously created inference session and properly constructed input tensors:

Create an InferenceSession from an ONNX model.
Construct ort.Tensor objects for each model input.
Build a feeds object mapping input names to tensors.
Call session.run(feeds) and await the result.
Extract output tensors from the results object by name.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment