Implementation:Tensorflow Tfjs GraphModel Predict

Knowledge Sources	TensorFlow.js tfjs-converter/src/executor/graph_model.ts TensorFlow.js API — GraphModel
Domains	Inference, Deep_Learning
Principle	Principle:Tensorflow_Tfjs_Graph_Model_Inference
Type	API Doc
Last Updated	2026-02-10 00:00 GMT

Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime Heuristic:Tensorflow_Tfjs_Memory_Management_With_Tidy Heuristic:Tensorflow_Tfjs_WebGL_Shader_Warmup Heuristic:Tensorflow_Tfjs_GPU_Pipeline_Data_Residency Heuristic:Tensorflow_Tfjs_WASM_Cross_Origin_Isolation

Overview

This implementation documents the inference APIs on the GraphModel class: predict() for synchronous prediction on standard models, execute() for synchronous execution with named output selection, and executeAsync() for asynchronous execution on models containing dynamic control flow operations. These are the primary methods for running predictions with loaded TensorFlow.js graph models.

Source References

predict(): tfjs-converter/src/executor/graph_model.ts:L357-361
executeAsync(): tfjs-converter/src/executor/graph_model.ts:L534-545

API: predict()

Signature

// From tfjs-converter/src/executor/graph_model.ts:L357-361
predict(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  config?: ModelPredictConfig
): Tensor | Tensor[] | NamedTensorMap

Parameters

Parameter	Type	Required	Description
inputs	Tensor, Tensor[], or NamedTensorMap	Yes	Input tensor(s) matching the model's input signature. Use a single Tensor for single-input models, Tensor[] for ordered multi-input, or NamedTensorMap ({ 'input_name:0': tensor }) for named inputs.
config	ModelPredictConfig	No	Configuration with optional batchSize (number) for splitting large inputs into smaller batches during execution.

Return Value

Returns Tensor, Tensor[], or NamedTensorMap depending on the model's output signature:

Single output: Returns a single Tensor
Multiple outputs: Returns Tensor[] or NamedTensorMap

When to Use

Use predict() for standard models that do not contain dynamic control flow operations (tf.while_loop, tf.cond with tensor-dependent conditions). This covers the vast majority of models:

Classification models (MobileNet, ResNet, EfficientNet, etc.)
Object detection models (SSD, YOLO, etc.)
Segmentation models (DeepLab, U-Net, etc.)
Text models (BERT embeddings, sentiment analysis, etc.)
Generative models without dynamic loops

Example

// Load the model
const model = await tf.loadGraphModel('https://example.com/model/model.json');

// === Single input, single output ===
const result = tf.tidy(() => {
  const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
  const normalized = input.div(255.0);
  return model.predict(normalized);
});

// Extract prediction values
const predictions = await result.data();
const topClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted class:', topClass);
result.dispose();

// === Named tensor inputs ===
const output = tf.tidy(() => {
  return model.predict({
    'input_image:0': tf.tensor4d([...pixels], [1, 224, 224, 3]),
    'input_metadata:0': tf.tensor2d([[0.5, 1.2]], [1, 2])
  });
});

API: execute()

Signature

execute(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Tensor | Tensor[]

Parameters

Parameter	Type	Required	Description
inputs	Tensor, Tensor[], or NamedTensorMap	Yes	Input tensor(s), same as predict()
outputs	string or string[]	No	Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes from the serving signature.

When to Use

Use execute() when you need to:

Extract outputs from specific intermediate nodes in the graph (not just the final output)
Access named outputs by their TensorFlow node names
Get multiple outputs from different parts of the graph

Example

// Extract specific named outputs from the graph
const [features, predictions] = model.execute(
  inputTensor,
  ['conv2d_5/Relu:0', 'dense_2/Softmax:0']
);

console.log('Feature map shape:', features.shape);   // e.g., [1, 7, 7, 512]
console.log('Prediction shape:', predictions.shape);  // e.g., [1, 1000]

features.dispose();
predictions.dispose();

API: executeAsync()

Signature

// From tfjs-converter/src/executor/graph_model.ts:L534-545
async executeAsync(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Promise<Tensor | Tensor[]>

Parameters

Parameter	Type	Required	Description
inputs	Tensor, Tensor[], or NamedTensorMap	Yes	Input tensor(s), same as predict() and execute()
outputs	string or string[]	No	Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes.

Return Value

Returns a Promise<Tensor | Tensor[]> that resolves when all dynamic control flow operations in the graph have completed.

When to Use

Use executeAsync() when the model contains dynamic control flow operations that cannot be resolved statically:

tf.while_loop — Loops where the number of iterations depends on tensor values (e.g., beam search in sequence-to-sequence models)
tf.cond — Conditional branches where the branch taken depends on a tensor value at runtime
TensorArray operations — Dynamic-length tensor collections used in RNNs and attention mechanisms
Models exported from TF 2.x with tf.function that contain Python control flow translated to TF control flow ops

If you call predict() or execute() on a model that requires async execution, TF.js will throw an error with a message indicating that executeAsync() is required.

Example

// Load a model with dynamic control flow (e.g., a beam search model)
const model = await tf.loadGraphModel('https://example.com/seq2seq/model.json');

// executeAsync is required for models with dynamic ops
const input = tf.tensor2d([[1, 2, 3, 4, 5]], [1, 5]);
const result = await model.executeAsync(input);

// Handle single or multiple outputs
if (Array.isArray(result)) {
  console.log('Multiple outputs:');
  result.forEach((t, i) => {
    console.log(`  Output ${i}: shape=${t.shape}, dtype=${t.dtype}`);
    t.dispose();
  });
} else {
  console.log('Single output: shape=', result.shape);
  const data = await result.data();
  console.log('Values:', data);
  result.dispose();
}

input.dispose();

// Async execution with specific output nodes
const [decoderOutput, attentionWeights] = await model.executeAsync(
  { 'encoder_input:0': encoderInput },
  ['decoder_output:0', 'attention_weights:0']
);

const decodedTokens = await decoderOutput.data();
console.log('Decoded tokens:', decodedTokens);

decoderOutput.dispose();
attentionWeights.dispose();

Input Preparation Patterns

Image Classification

// From an HTML image element
function preprocessImage(imgElement, targetSize = [224, 224]) {
  return tf.tidy(() => {
    // Convert image to tensor
    let tensor = tf.browser.fromPixels(imgElement);

    // Resize to model's expected input size
    tensor = tf.image.resizeBilinear(tensor, targetSize);

    // Normalize to [0, 1]
    tensor = tensor.div(255.0);

    // Add batch dimension: [height, width, channels] -> [1, height, width, channels]
    tensor = tensor.expandDims(0);

    return tensor;
  });
}

const input = preprocessImage(document.getElementById('my-image'));
const output = model.predict(input);
const probabilities = await output.data();
input.dispose();
output.dispose();

From Canvas or Video

// Real-time inference from a video element
async function classifyFrame(videoElement, model) {
  const output = tf.tidy(() => {
    const frame = tf.browser.fromPixels(videoElement);
    const resized = tf.image.resizeBilinear(frame, [224, 224]);
    const normalized = resized.div(255.0);
    const batched = normalized.expandDims(0);
    return model.predict(batched);
  });

  const predictions = await output.data();
  output.dispose();
  return predictions;
}

Batch Inference

// Process multiple images in a single batch
const batchSize = 8;
const images = [...imageDataArray];  // Array of pixel data

const batchTensor = tf.tidy(() => {
  const tensors = images.map(pixels =>
    tf.tensor3d(pixels, [224, 224, 3]).div(255.0)
  );
  return tf.stack(tensors);  // [batchSize, 224, 224, 3]
});

const batchOutput = model.predict(batchTensor);
const allPredictions = await batchOutput.data();

// Split predictions by batch item
for (let i = 0; i < batchSize; i++) {
  const start = i * 1000;  // 1000 classes per image
  const itemPredictions = allPredictions.slice(start, start + 1000);
  console.log(`Image ${i}: top class =`, itemPredictions.indexOf(Math.max(...itemPredictions)));
}

batchTensor.dispose();
batchOutput.dispose();

Memory Management

Using tf.tidy() for Synchronous Inference

// tf.tidy automatically disposes intermediate tensors
const output = tf.tidy(() => {
  const raw = tf.tensor4d(pixelData, [1, 224, 224, 3]);
  const normalized = raw.div(255.0);
  const shifted = normalized.sub(0.5);
  const scaled = shifted.mul(2.0);
  // raw, normalized, shifted are automatically disposed
  // Only the returned tensor (output of predict) survives
  return model.predict(scaled);
});

// Extract data and dispose the output
const result = await output.data();
output.dispose();

Manual Disposal for Async Inference

// tf.tidy does NOT work with async operations (executeAsync returns a Promise)
const input = tf.tensor4d(pixelData, [1, 224, 224, 3]);
const output = await model.executeAsync(input);

// Extract data
const resultData = await output.data();

// Manually dispose all tensors
input.dispose();
if (Array.isArray(output)) {
  output.forEach(t => t.dispose());
} else {
  output.dispose();
}

Monitoring Memory

// Check for memory leaks during development
console.log('Before inference:', tf.memory());
// { numTensors: 150, numDataBuffers: 150, numBytes: 25000000, ... }

const output = model.predict(input);
const data = await output.data();
output.dispose();
input.dispose();

console.log('After inference:', tf.memory());
// numTensors should return to approximately the same count

Error Handling

// Robust inference with error handling
async function runInference(model, inputData) {
  let input;
  let output;

  try {
    input = tf.tensor4d(inputData, [1, 224, 224, 3]);

    // Try synchronous prediction first
    try {
      output = model.predict(input);
    } catch (syncError) {
      // Fall back to async execution if predict() fails
      if (syncError.message.includes('dynamic ops') ||
          syncError.message.includes('executeAsync')) {
        console.warn('Model requires async execution, using executeAsync()');
        output = await model.executeAsync(input);
      } else {
        throw syncError;
      }
    }

    // Extract and return results
    const resultData = await (Array.isArray(output) ? output[0] : output).data();
    return Array.from(resultData);

  } catch (error) {
    console.error('Inference failed:', error.message);
    throw error;

  } finally {
    // Always clean up tensors
    if (input) input.dispose();
    if (output) {
      if (Array.isArray(output)) {
        output.forEach(t => t.dispose());
      } else {
        output.dispose();
      }
    }
  }
}

Performance Optimization

Technique	Description	Impact
tf.tidy()	Wrap synchronous inference to auto-dispose intermediates	Prevents memory leaks
Warm-up run	Run a single inference with dummy data after loading	Triggers JIT compilation; subsequent runs are faster
Batch inference	Process multiple inputs in a single predict() call	Better GPU utilization
WebGL backend	Use tf.setBackend('webgl') for GPU acceleration	10-100x faster than CPU for large models
WASM backend	Use tf.setBackend('wasm') for CPU fallback	Faster than default JS CPU backend
Input reuse	Reuse input tensors when shape is constant (e.g., video frames)	Reduces allocation overhead

// Warm-up run to trigger JIT compilation
const warmupInput = tf.zeros([1, 224, 224, 3]);
const warmupOutput = model.predict(warmupInput);
warmupOutput.dispose();
warmupInput.dispose();
console.log('Model warmed up, ready for fast inference');

Complete End-to-End Example

// Full example: load model, preprocess image, run inference, display results

async function classifyImage(imageUrl) {
  // 1. Load the model (cache for reuse)
  const model = await tf.loadGraphModel(
    'https://storage.googleapis.com/my-models/mobilenet/v2/model.json',
    {
      onProgress: (p) => console.log(`Loading: ${(p * 100).toFixed(0)}%`)
    }
  );

  // 2. Load and preprocess the image
  const img = new Image();
  img.crossOrigin = 'anonymous';
  await new Promise((resolve) => { img.onload = resolve; img.src = imageUrl; });

  // 3. Run inference with memory management
  const predictions = tf.tidy(() => {
    const tensor = tf.browser.fromPixels(img)
      .resizeBilinear([224, 224])
      .div(255.0)
      .expandDims(0);
    return model.predict(tensor);
  });

  // 4. Extract and process results
  const probabilities = await predictions.data();
  predictions.dispose();

  // 5. Find top-5 predictions
  const top5 = Array.from(probabilities)
    .map((prob, idx) => ({ probability: prob, classIndex: idx }))
    .sort((a, b) => b.probability - a.probability)
    .slice(0, 5);

  console.log('Top 5 predictions:');
  top5.forEach(({ probability, classIndex }) => {
    console.log(`  Class ${classIndex}: ${(probability * 100).toFixed(2)}%`);
  });

  return top5;
}

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Implementation:Tensorflow Tfjs GraphModel Predict

Overview

Source References

API: predict()

Signature

Parameters

Return Value

When to Use

Example

API: execute()

Signature

Parameters

When to Use

Example

API: executeAsync()

Signature

Parameters

Return Value

When to Use

Example

Input Preparation Patterns

Image Classification

From Canvas or Video

Batch Inference

Memory Management

Using tf.tidy() for Synchronous Inference

Manual Disposal for Async Inference

Monitoring Memory

Error Handling

Performance Optimization

Complete End-to-End Example

See Also

Environments

Heuristics

Page Connections