Workflow:Microsoft Onnxruntime Nodejs Inference
| Knowledge Sources | |
|---|---|
| Domains | ML_Inference, JavaScript, Server_Side_ML |
| Last Updated | 2026-02-10 04:30 GMT |
Overview
End-to-end process for loading an ONNX model and running inference in a Node.js application using the onnxruntime-node package.
Description
This workflow covers performing model inference in JavaScript/Node.js server-side applications using ONNX Runtime's native Node.js bindings. The onnxruntime-node package provides a high-performance inference engine with the same execution provider support as the C++ core. The workflow includes creating inference sessions from model files or buffers, constructing typed tensors, executing inference, and processing output predictions. It supports configuring session options for thread pool sizing and execution provider selection.
Usage
Execute this workflow when building Node.js backend services that need to run ML model inference. This is ideal for REST API endpoints serving predictions, real-time data processing pipelines, or any server-side JavaScript application requiring embedded ML inference without Python dependencies.
Execution Steps
Step 1: Install Dependencies
Add the onnxruntime-node package to the Node.js project. This installs the JavaScript API bindings and the pre-built native ONNX Runtime libraries for the host platform. The package supports Windows, macOS, and Linux.
Key considerations:
- Install via npm: onnxruntime-node for server-side or onnxruntime-web for browser
- Native binaries are included in the npm package
- Ensure the platform-specific binary matches the host architecture
Step 2: Create Inference Session
Create an InferenceSession by loading an ONNX model from a file path, ArrayBuffer, or Uint8Array buffer. Optionally pass session options to configure thread pool sizes and execution provider preferences. The session creation validates the model and prepares the execution graph.
Key considerations:
- InferenceSession.create is an async operation returning a Promise
- Model can be loaded from file path (string) or in-memory buffer
- Session options include intraOpNumThreads for CPU parallelism
Step 3: Construct Input Tensors
Create Tensor objects from JavaScript typed arrays (Float32Array, Int32Array, BigInt64Array, etc.) with specified dimensions. Each tensor requires the data buffer and a dims array describing the shape. Tensor types are inferred from the typed array or can be specified explicitly.
Key considerations:
- Data and dims must be consistent (product of dims equals data length)
- Supported types include float32, float64, int32, int64, bool, string, uint8
- String tensors use regular JavaScript string arrays
- Scalar tensors use an empty dims array
Step 4: Execute Inference
Call session.run with a feeds object mapping input names to Tensor objects. The feeds keys must match the model's input names exactly. The run method returns a Promise resolving to a results object with output tensors keyed by output name.
Key considerations:
- Input names must match the ONNX model's input node names
- The run method is async and returns a Promise
- Optionally specify fetches to retrieve only specific outputs
Step 5: Process Output Tensors
Extract prediction results from the output tensors. Access tensor data via the .data property (typed array), shape via .dims, and type via .type. Post-process results as needed for the application response.
Key considerations:
- Output tensor data is a typed array matching the output type
- Multiple outputs are accessed by their model-defined names
- Results can be directly serialized to JSON for API responses