Implementation:Tensorflow Tfjs GraphModel Predict
| Knowledge Sources | |
|---|---|
| Domains | Inference, Deep_Learning |
| Principle | Principle:Tensorflow_Tfjs_Graph_Model_Inference |
| Type | API Doc |
| Last Updated | 2026-02-10 00:00 GMT |
Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime Heuristic:Tensorflow_Tfjs_Memory_Management_With_Tidy Heuristic:Tensorflow_Tfjs_WebGL_Shader_Warmup Heuristic:Tensorflow_Tfjs_GPU_Pipeline_Data_Residency Heuristic:Tensorflow_Tfjs_WASM_Cross_Origin_Isolation
Overview
This implementation documents the inference APIs on the GraphModel class: predict() for synchronous prediction on standard models, execute() for synchronous execution with named output selection, and executeAsync() for asynchronous execution on models containing dynamic control flow operations. These are the primary methods for running predictions with loaded TensorFlow.js graph models.
Source References
- predict(): tfjs-converter/src/executor/graph_model.ts:L357-361
- executeAsync(): tfjs-converter/src/executor/graph_model.ts:L534-545
API: predict()
Signature
// From tfjs-converter/src/executor/graph_model.ts:L357-361
predict(
inputs: Tensor | Tensor[] | NamedTensorMap,
config?: ModelPredictConfig
): Tensor | Tensor[] | NamedTensorMap
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| inputs | Tensor, Tensor[], or NamedTensorMap | Yes | Input tensor(s) matching the model's input signature. Use a single Tensor for single-input models, Tensor[] for ordered multi-input, or NamedTensorMap ({ 'input_name:0': tensor }) for named inputs. |
| config | ModelPredictConfig | No | Configuration with optional batchSize (number) for splitting large inputs into smaller batches during execution. |
Return Value
Returns Tensor, Tensor[], or NamedTensorMap depending on the model's output signature:
- Single output: Returns a single Tensor
- Multiple outputs: Returns Tensor[] or NamedTensorMap
When to Use
Use predict() for standard models that do not contain dynamic control flow operations (tf.while_loop, tf.cond with tensor-dependent conditions). This covers the vast majority of models:
- Classification models (MobileNet, ResNet, EfficientNet, etc.)
- Object detection models (SSD, YOLO, etc.)
- Segmentation models (DeepLab, U-Net, etc.)
- Text models (BERT embeddings, sentiment analysis, etc.)
- Generative models without dynamic loops
Example
// Load the model
const model = await tf.loadGraphModel('https://example.com/model/model.json');
// === Single input, single output ===
const result = tf.tidy(() => {
const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
const normalized = input.div(255.0);
return model.predict(normalized);
});
// Extract prediction values
const predictions = await result.data();
const topClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted class:', topClass);
result.dispose();
// === Named tensor inputs ===
const output = tf.tidy(() => {
return model.predict({
'input_image:0': tf.tensor4d([...pixels], [1, 224, 224, 3]),
'input_metadata:0': tf.tensor2d([[0.5, 1.2]], [1, 2])
});
});
API: execute()
Signature
execute(
inputs: Tensor | Tensor[] | NamedTensorMap,
outputs?: string | string[]
): Tensor | Tensor[]
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| inputs | Tensor, Tensor[], or NamedTensorMap | Yes | Input tensor(s), same as predict() |
| outputs | string or string[] | No | Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes from the serving signature. |
When to Use
Use execute() when you need to:
- Extract outputs from specific intermediate nodes in the graph (not just the final output)
- Access named outputs by their TensorFlow node names
- Get multiple outputs from different parts of the graph
Example
// Extract specific named outputs from the graph
const [features, predictions] = model.execute(
inputTensor,
['conv2d_5/Relu:0', 'dense_2/Softmax:0']
);
console.log('Feature map shape:', features.shape); // e.g., [1, 7, 7, 512]
console.log('Prediction shape:', predictions.shape); // e.g., [1, 1000]
features.dispose();
predictions.dispose();
API: executeAsync()
Signature
// From tfjs-converter/src/executor/graph_model.ts:L534-545
async executeAsync(
inputs: Tensor | Tensor[] | NamedTensorMap,
outputs?: string | string[]
): Promise<Tensor | Tensor[]>
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| inputs | Tensor, Tensor[], or NamedTensorMap | Yes | Input tensor(s), same as predict() and execute() |
| outputs | string or string[] | No | Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes. |
Return Value
Returns a Promise<Tensor | Tensor[]> that resolves when all dynamic control flow operations in the graph have completed.
When to Use
Use executeAsync() when the model contains dynamic control flow operations that cannot be resolved statically:
- tf.while_loop — Loops where the number of iterations depends on tensor values (e.g., beam search in sequence-to-sequence models)
- tf.cond — Conditional branches where the branch taken depends on a tensor value at runtime
- TensorArray operations — Dynamic-length tensor collections used in RNNs and attention mechanisms
- Models exported from TF 2.x with tf.function that contain Python control flow translated to TF control flow ops
If you call predict() or execute() on a model that requires async execution, TF.js will throw an error with a message indicating that executeAsync() is required.
Example
// Load a model with dynamic control flow (e.g., a beam search model)
const model = await tf.loadGraphModel('https://example.com/seq2seq/model.json');
// executeAsync is required for models with dynamic ops
const input = tf.tensor2d([[1, 2, 3, 4, 5]], [1, 5]);
const result = await model.executeAsync(input);
// Handle single or multiple outputs
if (Array.isArray(result)) {
console.log('Multiple outputs:');
result.forEach((t, i) => {
console.log(` Output ${i}: shape=${t.shape}, dtype=${t.dtype}`);
t.dispose();
});
} else {
console.log('Single output: shape=', result.shape);
const data = await result.data();
console.log('Values:', data);
result.dispose();
}
input.dispose();
// Async execution with specific output nodes
const [decoderOutput, attentionWeights] = await model.executeAsync(
{ 'encoder_input:0': encoderInput },
['decoder_output:0', 'attention_weights:0']
);
const decodedTokens = await decoderOutput.data();
console.log('Decoded tokens:', decodedTokens);
decoderOutput.dispose();
attentionWeights.dispose();
Input Preparation Patterns
Image Classification
// From an HTML image element
function preprocessImage(imgElement, targetSize = [224, 224]) {
return tf.tidy(() => {
// Convert image to tensor
let tensor = tf.browser.fromPixels(imgElement);
// Resize to model's expected input size
tensor = tf.image.resizeBilinear(tensor, targetSize);
// Normalize to [0, 1]
tensor = tensor.div(255.0);
// Add batch dimension: [height, width, channels] -> [1, height, width, channels]
tensor = tensor.expandDims(0);
return tensor;
});
}
const input = preprocessImage(document.getElementById('my-image'));
const output = model.predict(input);
const probabilities = await output.data();
input.dispose();
output.dispose();
From Canvas or Video
// Real-time inference from a video element
async function classifyFrame(videoElement, model) {
const output = tf.tidy(() => {
const frame = tf.browser.fromPixels(videoElement);
const resized = tf.image.resizeBilinear(frame, [224, 224]);
const normalized = resized.div(255.0);
const batched = normalized.expandDims(0);
return model.predict(batched);
});
const predictions = await output.data();
output.dispose();
return predictions;
}
Batch Inference
// Process multiple images in a single batch
const batchSize = 8;
const images = [...imageDataArray]; // Array of pixel data
const batchTensor = tf.tidy(() => {
const tensors = images.map(pixels =>
tf.tensor3d(pixels, [224, 224, 3]).div(255.0)
);
return tf.stack(tensors); // [batchSize, 224, 224, 3]
});
const batchOutput = model.predict(batchTensor);
const allPredictions = await batchOutput.data();
// Split predictions by batch item
for (let i = 0; i < batchSize; i++) {
const start = i * 1000; // 1000 classes per image
const itemPredictions = allPredictions.slice(start, start + 1000);
console.log(`Image ${i}: top class =`, itemPredictions.indexOf(Math.max(...itemPredictions)));
}
batchTensor.dispose();
batchOutput.dispose();
Memory Management
Using tf.tidy() for Synchronous Inference
// tf.tidy automatically disposes intermediate tensors
const output = tf.tidy(() => {
const raw = tf.tensor4d(pixelData, [1, 224, 224, 3]);
const normalized = raw.div(255.0);
const shifted = normalized.sub(0.5);
const scaled = shifted.mul(2.0);
// raw, normalized, shifted are automatically disposed
// Only the returned tensor (output of predict) survives
return model.predict(scaled);
});
// Extract data and dispose the output
const result = await output.data();
output.dispose();
Manual Disposal for Async Inference
// tf.tidy does NOT work with async operations (executeAsync returns a Promise)
const input = tf.tensor4d(pixelData, [1, 224, 224, 3]);
const output = await model.executeAsync(input);
// Extract data
const resultData = await output.data();
// Manually dispose all tensors
input.dispose();
if (Array.isArray(output)) {
output.forEach(t => t.dispose());
} else {
output.dispose();
}
Monitoring Memory
// Check for memory leaks during development
console.log('Before inference:', tf.memory());
// { numTensors: 150, numDataBuffers: 150, numBytes: 25000000, ... }
const output = model.predict(input);
const data = await output.data();
output.dispose();
input.dispose();
console.log('After inference:', tf.memory());
// numTensors should return to approximately the same count
Error Handling
// Robust inference with error handling
async function runInference(model, inputData) {
let input;
let output;
try {
input = tf.tensor4d(inputData, [1, 224, 224, 3]);
// Try synchronous prediction first
try {
output = model.predict(input);
} catch (syncError) {
// Fall back to async execution if predict() fails
if (syncError.message.includes('dynamic ops') ||
syncError.message.includes('executeAsync')) {
console.warn('Model requires async execution, using executeAsync()');
output = await model.executeAsync(input);
} else {
throw syncError;
}
}
// Extract and return results
const resultData = await (Array.isArray(output) ? output[0] : output).data();
return Array.from(resultData);
} catch (error) {
console.error('Inference failed:', error.message);
throw error;
} finally {
// Always clean up tensors
if (input) input.dispose();
if (output) {
if (Array.isArray(output)) {
output.forEach(t => t.dispose());
} else {
output.dispose();
}
}
}
}
Performance Optimization
| Technique | Description | Impact |
|---|---|---|
| tf.tidy() | Wrap synchronous inference to auto-dispose intermediates | Prevents memory leaks |
| Warm-up run | Run a single inference with dummy data after loading | Triggers JIT compilation; subsequent runs are faster |
| Batch inference | Process multiple inputs in a single predict() call | Better GPU utilization |
| WebGL backend | Use tf.setBackend('webgl') for GPU acceleration | 10-100x faster than CPU for large models |
| WASM backend | Use tf.setBackend('wasm') for CPU fallback | Faster than default JS CPU backend |
| Input reuse | Reuse input tensors when shape is constant (e.g., video frames) | Reduces allocation overhead |
// Warm-up run to trigger JIT compilation
const warmupInput = tf.zeros([1, 224, 224, 3]);
const warmupOutput = model.predict(warmupInput);
warmupOutput.dispose();
warmupInput.dispose();
console.log('Model warmed up, ready for fast inference');
Complete End-to-End Example
// Full example: load model, preprocess image, run inference, display results
async function classifyImage(imageUrl) {
// 1. Load the model (cache for reuse)
const model = await tf.loadGraphModel(
'https://storage.googleapis.com/my-models/mobilenet/v2/model.json',
{
onProgress: (p) => console.log(`Loading: ${(p * 100).toFixed(0)}%`)
}
);
// 2. Load and preprocess the image
const img = new Image();
img.crossOrigin = 'anonymous';
await new Promise((resolve) => { img.onload = resolve; img.src = imageUrl; });
// 3. Run inference with memory management
const predictions = tf.tidy(() => {
const tensor = tf.browser.fromPixels(img)
.resizeBilinear([224, 224])
.div(255.0)
.expandDims(0);
return model.predict(tensor);
});
// 4. Extract and process results
const probabilities = await predictions.data();
predictions.dispose();
// 5. Find top-5 predictions
const top5 = Array.from(probabilities)
.map((prob, idx) => ({ probability: prob, classIndex: idx }))
.sort((a, b) => b.probability - a.probability)
.slice(0, 5);
console.log('Top 5 predictions:');
top5.forEach(({ probability, classIndex }) => {
console.log(` Class ${classIndex}: ${(probability * 100).toFixed(2)}%`);
});
return top5;
}
See Also
- Principle:Tensorflow_Tfjs_Graph_Model_Inference — The principle this implementation fulfills
- Implementation:Tensorflow_Tfjs_Tf_LoadGraphModel — Previous step: loading the model
- Principle:Tensorflow_Tfjs_Pretrained_Model_Loading — How models are loaded before inference
Environments
- Environment:Tensorflow_Tfjs_Browser_Runtime -- Browser runtime (WebGL / WebGPU / WASM / CPU backends)
- Environment:Tensorflow_Tfjs_Node_Native_Runtime -- Node.js native runtime (TensorFlow C binding)
Heuristics
- Heuristic:Tensorflow_Tfjs_Memory_Management_With_Tidy -- Wrap predictions in tf.tidy() to prevent memory leaks
- Heuristic:Tensorflow_Tfjs_WebGL_Shader_Warmup -- Warm up WebGL shaders with a dummy predict call to avoid first-inference latency
- Heuristic:Tensorflow_Tfjs_GPU_Pipeline_Data_Residency -- Keep tensor data on GPU to avoid CPU round-trips
- Heuristic:Tensorflow_Tfjs_WASM_Cross_Origin_Isolation -- Enable Cross-Origin Isolation headers for WASM multi-threading