Principle:Tensorflow Tfjs Graph Model Inference
| Knowledge Sources | |
|---|---|
| Domains | Inference, Deep_Learning |
| Implementation | Implementation:Tensorflow_Tfjs_GraphModel_Predict |
| Type | API Doc |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Running inference on pre-converted graph models, including synchronous prediction and asynchronous execution for models with dynamic control flow. Executing a frozen computation graph on input data to produce predictions is the final step in the model deployment pipeline, where preprocessed input tensors flow through the model's operations to yield output predictions.
Theory
Graph Model vs. Layers Model Inference
Graph model inference differs fundamentally from layers model inference:
- Graph models operate on a frozen, optimized TensorFlow computation graph — the exact graph that was exported from Python TensorFlow. All operations, control flow, and data paths are preserved as they were in the original TensorFlow runtime. The graph is immutable; weights cannot be updated.
- Layers models operate on a Keras-style layer graph — a higher-level abstraction where the model is composed of named layers. Layers models support both inference and training (fine-tuning).
This distinction is important because graph models can represent computations that go beyond the Keras layer abstraction, including custom TensorFlow operations, complex control flow, and multi-input/multi-output architectures that were defined imperatively in Python.
Two Execution Modes
TensorFlow.js provides two execution modes for graph models, each serving a different class of computation graphs:
Synchronous Prediction: predict()
The predict() method is the standard inference entry point for graph models that have a static computation graph:
- Executes the model's default serving signature
- Input is matched to the signature's input tensors
- Output corresponds to the signature's output tensors
- All operations in the graph are executed synchronously (within a single JavaScript tick for CPU, or dispatched as a batch to the GPU)
- Suitable for the vast majority of standard deep learning models (CNNs, RNNs, transformers, etc.)
Source reference: tfjs-converter/src/executor/graph_model.ts:L357-361
Asynchronous Execution: executeAsync()
The executeAsync() method is required for models that contain dynamic control flow operations:
- Dynamic control flow includes TensorFlow tf.while_loop, tf.cond, and related operations that make branching decisions based on tensor values at runtime
- These operations cannot be resolved statically because the number of iterations or the branch taken depends on the actual input data
- executeAsync() returns a Promise that resolves when all dynamic operations complete
- It also allows specifying output node names to extract intermediate results from the graph
Source reference: tfjs-converter/src/executor/graph_model.ts:L534-545
When to Use Each Mode
| Execution Mode | Method | Use When | Returns |
|---|---|---|---|
| Synchronous | predict() | Model has no dynamic control flow ops; standard CNN, RNN, transformer, etc. | Tensor, Tensor[], or NamedTensorMap |
| Asynchronous | executeAsync() | Model contains tf.while_loop, tf.cond, or other dynamic ops | Promise<Tensor or Tensor[]> |
| Synchronous (named outputs) | execute() | Need to extract specific named output nodes from the graph | Tensor or Tensor[] |
If you are unsure whether a model requires async execution, you can:
- Try predict() first — if the model contains unsupported dynamic ops, it will throw an error with a descriptive message
- Use executeAsync() as a fallback, which handles both static and dynamic graphs
Input Preparation
Graph model inference requires careful input tensor preparation:
- Shape matching: Input tensors must match the exact shape expected by the model's input signature. For image models, this typically means [batch, height, width, channels] (NHWC format)
- Dtype matching: Input data types must match (commonly float32)
- Normalization: Input values must be preprocessed to match the training pipeline (e.g., pixel values scaled to [0, 1] or [-1, 1])
- Batching: The first dimension is typically the batch dimension; single inputs should be wrapped in a batch of size 1
Named Tensor Inputs and Outputs
Graph models may have named input and output tensors derived from the TensorFlow serving signature:
- Named inputs: Pass a NamedTensorMap (a plain object mapping tensor names to Tensor values) when the model has multiple inputs
- Named outputs: Use execute() or executeAsync() with an array of output node names to select specific outputs from the graph
- Tensor names follow the pattern "node_name:output_index" (e.g., "dense_1/Softmax:0")
Memory Management
Proper memory management is critical during inference to prevent memory leaks:
- tf.tidy(): Wraps synchronous inference code; automatically disposes all intermediate tensors created within the callback
- tensor.dispose(): Manually disposes individual tensors (required for async operations outside of tidy)
- tensor.data() / tensor.array(): Asynchronously extracts tensor data to JavaScript arrays, then dispose the tensor
- Output tensors returned by predict()/execute()/executeAsync() must be disposed after extracting their values
Signature
// Synchronous prediction (standard models)
predict(
inputs: Tensor | Tensor[] | NamedTensorMap,
config?: ModelPredictConfig
): Tensor | Tensor[] | NamedTensorMap
// Synchronous execution with named outputs
execute(
inputs: Tensor | Tensor[] | NamedTensorMap,
outputs?: string | string[]
): Tensor | Tensor[]
// Asynchronous execution (models with dynamic control flow)
async executeAsync(
inputs: Tensor | Tensor[] | NamedTensorMap,
outputs?: string | string[]
): Promise<Tensor | Tensor[]>
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| inputs | Tensor, Tensor[], or NamedTensorMap | Input tensor(s) matching the model's input signature. Use NamedTensorMap for models with multiple named inputs. |
| config (predict only) | ModelPredictConfig | Configuration object with optional batchSize for splitting large inputs. |
| outputs (execute/executeAsync) | string or string[] | Name(s) of specific output node(s) to extract from the graph. If omitted, uses the model's default output nodes. |
Inputs and Outputs
Inputs
- A loaded GraphModel instance (from tf.loadGraphModel())
- Preprocessed input tensors matching the model's expected input signature (shape, dtype, value range)
Outputs
- Tensor: A single output tensor (most common for classification models)
- Tensor[]: An array of output tensors (for multi-output models)
- NamedTensorMap: A named map of output tensors (when predict() is used with certain model configurations)
- For executeAsync(): A Promise wrapping any of the above
Example
// Load the model (see Pretrained_Model_Loading principle)
const model = await tf.loadGraphModel('https://example.com/model/model.json');
// === Standard prediction with tf.tidy for memory management ===
const output = tf.tidy(() => {
// Prepare input: batch of 1 image, 224x224, 3 channels
const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
// Normalize to [0, 1]
const normalized = input.div(255.0);
// Run prediction
return model.predict(normalized);
});
// Extract results
const predictions = await output.data();
console.log('Top prediction:', predictions.indexOf(Math.max(...predictions)));
output.dispose();
// === Async execution for models with control flow ===
const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
const result = await model.executeAsync(input, ['output_node:0']);
const resultData = await result.data();
input.dispose();
result.dispose();
// === Named tensor inputs for multi-input models ===
const imageInput = tf.tensor4d([...imagePixels], [1, 224, 224, 3]);
const metadataInput = tf.tensor2d([[0.5, 1.2, 3.0]], [1, 3]);
const multiResult = model.predict({
'image_input:0': imageInput,
'metadata_input:0': metadataInput
});
// === Extract specific output nodes with execute() ===
const intermediateOutputs = model.execute(
inputTensor,
['conv2d_1/Relu:0', 'dense_1/Softmax:0']
);
// intermediateOutputs is [Tensor, Tensor]
See Also
- Implementation:Tensorflow_Tfjs_GraphModel_Predict — Concrete implementation of this principle
- Principle:Tensorflow_Tfjs_Pretrained_Model_Loading — Previous step: loading the model