Workflow:Microsoft Onnxruntime Nodejs Inference

Knowledge Sources	ONNX Runtime ONNX Runtime Node.js API onnxruntime-node npm
Domains	ML_Inference, JavaScript, Server_Side_ML
Last Updated	2026-02-10 04:30 GMT

Overview

End-to-end process for loading an ONNX model and running inference in a Node.js application using the onnxruntime-node package.

Description

This workflow covers performing model inference in JavaScript/Node.js server-side applications using ONNX Runtime's native Node.js bindings. The onnxruntime-node package provides a high-performance inference engine with the same execution provider support as the C++ core. The workflow includes creating inference sessions from model files or buffers, constructing typed tensors, executing inference, and processing output predictions. It supports configuring session options for thread pool sizing and execution provider selection.

Usage

Execute this workflow when building Node.js backend services that need to run ML model inference. This is ideal for REST API endpoints serving predictions, real-time data processing pipelines, or any server-side JavaScript application requiring embedded ML inference without Python dependencies.

Execution Steps

Step 1: Install Dependencies

Add the onnxruntime-node package to the Node.js project. This installs the JavaScript API bindings and the pre-built native ONNX Runtime libraries for the host platform. The package supports Windows, macOS, and Linux.

Key considerations:

Install via npm: onnxruntime-node for server-side or onnxruntime-web for browser
Native binaries are included in the npm package
Ensure the platform-specific binary matches the host architecture

Step 2: Create Inference Session

Create an InferenceSession by loading an ONNX model from a file path, ArrayBuffer, or Uint8Array buffer. Optionally pass session options to configure thread pool sizes and execution provider preferences. The session creation validates the model and prepares the execution graph.

Key considerations:

InferenceSession.create is an async operation returning a Promise
Model can be loaded from file path (string) or in-memory buffer
Session options include intraOpNumThreads for CPU parallelism

Step 3: Construct Input Tensors

Create Tensor objects from JavaScript typed arrays (Float32Array, Int32Array, BigInt64Array, etc.) with specified dimensions. Each tensor requires the data buffer and a dims array describing the shape. Tensor types are inferred from the typed array or can be specified explicitly.

Key considerations:

Data and dims must be consistent (product of dims equals data length)
Supported types include float32, float64, int32, int64, bool, string, uint8
String tensors use regular JavaScript string arrays
Scalar tensors use an empty dims array

Step 4: Execute Inference

Call session.run with a feeds object mapping input names to Tensor objects. The feeds keys must match the model's input names exactly. The run method returns a Promise resolving to a results object with output tensors keyed by output name.

Key considerations:

Input names must match the ONNX model's input node names
The run method is async and returns a Promise
Optionally specify fetches to retrieve only specific outputs

Step 5: Process Output Tensors

Extract prediction results from the output tensors. Access tensor data via the .data property (typed array), shape via .dims, and type via .type. Post-process results as needed for the application response.

Key considerations:

Output tensor data is a typed array matching the output type
Multiple outputs are accessed by their model-defined names
Results can be directly serialized to JSON for API responses

Execution Diagram

GitHub URL

Workflow Repository