Principle:Tensorflow Tfjs Graph Model Inference

Knowledge Sources	TensorFlow.js tfjs-converter/src/executor/graph_model.ts TensorFlow.js API — GraphModel
Domains	Inference, Deep_Learning
Implementation	Implementation:Tensorflow_Tfjs_GraphModel_Predict
Type	API Doc
Last Updated	2026-02-10 00:00 GMT

Overview

Running inference on pre-converted graph models, including synchronous prediction and asynchronous execution for models with dynamic control flow. Executing a frozen computation graph on input data to produce predictions is the final step in the model deployment pipeline, where preprocessed input tensors flow through the model's operations to yield output predictions.

Theory

Graph Model vs. Layers Model Inference

Graph model inference differs fundamentally from layers model inference:

Graph models operate on a frozen, optimized TensorFlow computation graph — the exact graph that was exported from Python TensorFlow. All operations, control flow, and data paths are preserved as they were in the original TensorFlow runtime. The graph is immutable; weights cannot be updated.
Layers models operate on a Keras-style layer graph — a higher-level abstraction where the model is composed of named layers. Layers models support both inference and training (fine-tuning).

This distinction is important because graph models can represent computations that go beyond the Keras layer abstraction, including custom TensorFlow operations, complex control flow, and multi-input/multi-output architectures that were defined imperatively in Python.

Two Execution Modes

TensorFlow.js provides two execution modes for graph models, each serving a different class of computation graphs:

Synchronous Prediction: predict()

The predict() method is the standard inference entry point for graph models that have a static computation graph:

Executes the model's default serving signature
Input is matched to the signature's input tensors
Output corresponds to the signature's output tensors
All operations in the graph are executed synchronously (within a single JavaScript tick for CPU, or dispatched as a batch to the GPU)
Suitable for the vast majority of standard deep learning models (CNNs, RNNs, transformers, etc.)

Source reference: tfjs-converter/src/executor/graph_model.ts:L357-361

Asynchronous Execution: executeAsync()

The executeAsync() method is required for models that contain dynamic control flow operations:

Dynamic control flow includes TensorFlow tf.while_loop, tf.cond, and related operations that make branching decisions based on tensor values at runtime
These operations cannot be resolved statically because the number of iterations or the branch taken depends on the actual input data
executeAsync() returns a Promise that resolves when all dynamic operations complete
It also allows specifying output node names to extract intermediate results from the graph

Source reference: tfjs-converter/src/executor/graph_model.ts:L534-545

When to Use Each Mode

Execution Mode	Method	Use When	Returns
Synchronous	predict()	Model has no dynamic control flow ops; standard CNN, RNN, transformer, etc.	Tensor, Tensor[], or NamedTensorMap
Asynchronous	executeAsync()	Model contains tf.while_loop, tf.cond, or other dynamic ops	Promise<Tensor or Tensor[]>
Synchronous (named outputs)	execute()	Need to extract specific named output nodes from the graph	Tensor or Tensor[]

If you are unsure whether a model requires async execution, you can:

Try predict() first — if the model contains unsupported dynamic ops, it will throw an error with a descriptive message
Use executeAsync() as a fallback, which handles both static and dynamic graphs

Input Preparation

Graph model inference requires careful input tensor preparation:

Shape matching: Input tensors must match the exact shape expected by the model's input signature. For image models, this typically means [batch, height, width, channels] (NHWC format)
Dtype matching: Input data types must match (commonly float32)
Normalization: Input values must be preprocessed to match the training pipeline (e.g., pixel values scaled to [0, 1] or [-1, 1])
Batching: The first dimension is typically the batch dimension; single inputs should be wrapped in a batch of size 1

Named Tensor Inputs and Outputs

Graph models may have named input and output tensors derived from the TensorFlow serving signature:

Named inputs: Pass a NamedTensorMap (a plain object mapping tensor names to Tensor values) when the model has multiple inputs
Named outputs: Use execute() or executeAsync() with an array of output node names to select specific outputs from the graph
Tensor names follow the pattern "node_name:output_index" (e.g., "dense_1/Softmax:0")

Memory Management

Proper memory management is critical during inference to prevent memory leaks:

tf.tidy(): Wraps synchronous inference code; automatically disposes all intermediate tensors created within the callback
tensor.dispose(): Manually disposes individual tensors (required for async operations outside of tidy)
tensor.data() / tensor.array(): Asynchronously extracts tensor data to JavaScript arrays, then dispose the tensor
Output tensors returned by predict()/execute()/executeAsync() must be disposed after extracting their values

Signature

// Synchronous prediction (standard models)
predict(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  config?: ModelPredictConfig
): Tensor | Tensor[] | NamedTensorMap

// Synchronous execution with named outputs
execute(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Tensor | Tensor[]

// Asynchronous execution (models with dynamic control flow)
async executeAsync(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Promise<Tensor | Tensor[]>

Key Parameters

Parameter	Type	Description
inputs	Tensor, Tensor[], or NamedTensorMap	Input tensor(s) matching the model's input signature. Use NamedTensorMap for models with multiple named inputs.
config (predict only)	ModelPredictConfig	Configuration object with optional batchSize for splitting large inputs.
outputs (execute/executeAsync)	string or string[]	Name(s) of specific output node(s) to extract from the graph. If omitted, uses the model's default output nodes.

Inputs and Outputs

Inputs

A loaded GraphModel instance (from tf.loadGraphModel())
Preprocessed input tensors matching the model's expected input signature (shape, dtype, value range)

Outputs

Tensor: A single output tensor (most common for classification models)
Tensor[]: An array of output tensors (for multi-output models)
NamedTensorMap: A named map of output tensors (when predict() is used with certain model configurations)
For executeAsync(): A Promise wrapping any of the above

Example

// Load the model (see Pretrained_Model_Loading principle)
const model = await tf.loadGraphModel('https://example.com/model/model.json');

// === Standard prediction with tf.tidy for memory management ===
const output = tf.tidy(() => {
  // Prepare input: batch of 1 image, 224x224, 3 channels
  const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
  // Normalize to [0, 1]
  const normalized = input.div(255.0);
  // Run prediction
  return model.predict(normalized);
});

// Extract results
const predictions = await output.data();
console.log('Top prediction:', predictions.indexOf(Math.max(...predictions)));
output.dispose();

// === Async execution for models with control flow ===
const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
const result = await model.executeAsync(input, ['output_node:0']);
const resultData = await result.data();
input.dispose();
result.dispose();

// === Named tensor inputs for multi-input models ===
const imageInput = tf.tensor4d([...imagePixels], [1, 224, 224, 3]);
const metadataInput = tf.tensor2d([[0.5, 1.2, 3.0]], [1, 3]);
const multiResult = model.predict({
  'image_input:0': imageInput,
  'metadata_input:0': metadataInput
});

// === Extract specific output nodes with execute() ===
const intermediateOutputs = model.execute(
  inputTensor,
  ['conv2d_1/Relu:0', 'dense_1/Softmax:0']
);
// intermediateOutputs is [Tensor, Tensor]

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment