Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Tfjs GraphModel Predict

From Leeroopedia


Knowledge Sources
Domains Inference, Deep_Learning
Principle Principle:Tensorflow_Tfjs_Graph_Model_Inference
Type API Doc
Last Updated 2026-02-10 00:00 GMT

Environment:Tensorflow_Tfjs_Browser_Runtime Environment:Tensorflow_Tfjs_Node_Native_Runtime Heuristic:Tensorflow_Tfjs_Memory_Management_With_Tidy Heuristic:Tensorflow_Tfjs_WebGL_Shader_Warmup Heuristic:Tensorflow_Tfjs_GPU_Pipeline_Data_Residency Heuristic:Tensorflow_Tfjs_WASM_Cross_Origin_Isolation

Overview

This implementation documents the inference APIs on the GraphModel class: predict() for synchronous prediction on standard models, execute() for synchronous execution with named output selection, and executeAsync() for asynchronous execution on models containing dynamic control flow operations. These are the primary methods for running predictions with loaded TensorFlow.js graph models.

Source References

  • predict(): tfjs-converter/src/executor/graph_model.ts:L357-361
  • executeAsync(): tfjs-converter/src/executor/graph_model.ts:L534-545

API: predict()

Signature

// From tfjs-converter/src/executor/graph_model.ts:L357-361
predict(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  config?: ModelPredictConfig
): Tensor | Tensor[] | NamedTensorMap

Parameters

Parameter Type Required Description
inputs Tensor, Tensor[], or NamedTensorMap Yes Input tensor(s) matching the model's input signature. Use a single Tensor for single-input models, Tensor[] for ordered multi-input, or NamedTensorMap ({ 'input_name:0': tensor }) for named inputs.
config ModelPredictConfig No Configuration with optional batchSize (number) for splitting large inputs into smaller batches during execution.

Return Value

Returns Tensor, Tensor[], or NamedTensorMap depending on the model's output signature:

  • Single output: Returns a single Tensor
  • Multiple outputs: Returns Tensor[] or NamedTensorMap

When to Use

Use predict() for standard models that do not contain dynamic control flow operations (tf.while_loop, tf.cond with tensor-dependent conditions). This covers the vast majority of models:

  • Classification models (MobileNet, ResNet, EfficientNet, etc.)
  • Object detection models (SSD, YOLO, etc.)
  • Segmentation models (DeepLab, U-Net, etc.)
  • Text models (BERT embeddings, sentiment analysis, etc.)
  • Generative models without dynamic loops

Example

// Load the model
const model = await tf.loadGraphModel('https://example.com/model/model.json');

// === Single input, single output ===
const result = tf.tidy(() => {
  const input = tf.tensor4d([...pixelData], [1, 224, 224, 3]);
  const normalized = input.div(255.0);
  return model.predict(normalized);
});

// Extract prediction values
const predictions = await result.data();
const topClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted class:', topClass);
result.dispose();

// === Named tensor inputs ===
const output = tf.tidy(() => {
  return model.predict({
    'input_image:0': tf.tensor4d([...pixels], [1, 224, 224, 3]),
    'input_metadata:0': tf.tensor2d([[0.5, 1.2]], [1, 2])
  });
});

API: execute()

Signature

execute(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Tensor | Tensor[]

Parameters

Parameter Type Required Description
inputs Tensor, Tensor[], or NamedTensorMap Yes Input tensor(s), same as predict()
outputs string or string[] No Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes from the serving signature.

When to Use

Use execute() when you need to:

  • Extract outputs from specific intermediate nodes in the graph (not just the final output)
  • Access named outputs by their TensorFlow node names
  • Get multiple outputs from different parts of the graph

Example

// Extract specific named outputs from the graph
const [features, predictions] = model.execute(
  inputTensor,
  ['conv2d_5/Relu:0', 'dense_2/Softmax:0']
);

console.log('Feature map shape:', features.shape);   // e.g., [1, 7, 7, 512]
console.log('Prediction shape:', predictions.shape);  // e.g., [1, 1000]

features.dispose();
predictions.dispose();

API: executeAsync()

Signature

// From tfjs-converter/src/executor/graph_model.ts:L534-545
async executeAsync(
  inputs: Tensor | Tensor[] | NamedTensorMap,
  outputs?: string | string[]
): Promise<Tensor | Tensor[]>

Parameters

Parameter Type Required Description
inputs Tensor, Tensor[], or NamedTensorMap Yes Input tensor(s), same as predict() and execute()
outputs string or string[] No Name(s) of specific output node(s) to extract. If omitted, uses the model's default output nodes.

Return Value

Returns a Promise<Tensor | Tensor[]> that resolves when all dynamic control flow operations in the graph have completed.

When to Use

Use executeAsync() when the model contains dynamic control flow operations that cannot be resolved statically:

  • tf.while_loop — Loops where the number of iterations depends on tensor values (e.g., beam search in sequence-to-sequence models)
  • tf.cond — Conditional branches where the branch taken depends on a tensor value at runtime
  • TensorArray operations — Dynamic-length tensor collections used in RNNs and attention mechanisms
  • Models exported from TF 2.x with tf.function that contain Python control flow translated to TF control flow ops

If you call predict() or execute() on a model that requires async execution, TF.js will throw an error with a message indicating that executeAsync() is required.

Example

// Load a model with dynamic control flow (e.g., a beam search model)
const model = await tf.loadGraphModel('https://example.com/seq2seq/model.json');

// executeAsync is required for models with dynamic ops
const input = tf.tensor2d([[1, 2, 3, 4, 5]], [1, 5]);
const result = await model.executeAsync(input);

// Handle single or multiple outputs
if (Array.isArray(result)) {
  console.log('Multiple outputs:');
  result.forEach((t, i) => {
    console.log(`  Output ${i}: shape=${t.shape}, dtype=${t.dtype}`);
    t.dispose();
  });
} else {
  console.log('Single output: shape=', result.shape);
  const data = await result.data();
  console.log('Values:', data);
  result.dispose();
}

input.dispose();
// Async execution with specific output nodes
const [decoderOutput, attentionWeights] = await model.executeAsync(
  { 'encoder_input:0': encoderInput },
  ['decoder_output:0', 'attention_weights:0']
);

const decodedTokens = await decoderOutput.data();
console.log('Decoded tokens:', decodedTokens);

decoderOutput.dispose();
attentionWeights.dispose();

Input Preparation Patterns

Image Classification

// From an HTML image element
function preprocessImage(imgElement, targetSize = [224, 224]) {
  return tf.tidy(() => {
    // Convert image to tensor
    let tensor = tf.browser.fromPixels(imgElement);

    // Resize to model's expected input size
    tensor = tf.image.resizeBilinear(tensor, targetSize);

    // Normalize to [0, 1]
    tensor = tensor.div(255.0);

    // Add batch dimension: [height, width, channels] -> [1, height, width, channels]
    tensor = tensor.expandDims(0);

    return tensor;
  });
}

const input = preprocessImage(document.getElementById('my-image'));
const output = model.predict(input);
const probabilities = await output.data();
input.dispose();
output.dispose();

From Canvas or Video

// Real-time inference from a video element
async function classifyFrame(videoElement, model) {
  const output = tf.tidy(() => {
    const frame = tf.browser.fromPixels(videoElement);
    const resized = tf.image.resizeBilinear(frame, [224, 224]);
    const normalized = resized.div(255.0);
    const batched = normalized.expandDims(0);
    return model.predict(batched);
  });

  const predictions = await output.data();
  output.dispose();
  return predictions;
}

Batch Inference

// Process multiple images in a single batch
const batchSize = 8;
const images = [...imageDataArray];  // Array of pixel data

const batchTensor = tf.tidy(() => {
  const tensors = images.map(pixels =>
    tf.tensor3d(pixels, [224, 224, 3]).div(255.0)
  );
  return tf.stack(tensors);  // [batchSize, 224, 224, 3]
});

const batchOutput = model.predict(batchTensor);
const allPredictions = await batchOutput.data();

// Split predictions by batch item
for (let i = 0; i < batchSize; i++) {
  const start = i * 1000;  // 1000 classes per image
  const itemPredictions = allPredictions.slice(start, start + 1000);
  console.log(`Image ${i}: top class =`, itemPredictions.indexOf(Math.max(...itemPredictions)));
}

batchTensor.dispose();
batchOutput.dispose();

Memory Management

Using tf.tidy() for Synchronous Inference

// tf.tidy automatically disposes intermediate tensors
const output = tf.tidy(() => {
  const raw = tf.tensor4d(pixelData, [1, 224, 224, 3]);
  const normalized = raw.div(255.0);
  const shifted = normalized.sub(0.5);
  const scaled = shifted.mul(2.0);
  // raw, normalized, shifted are automatically disposed
  // Only the returned tensor (output of predict) survives
  return model.predict(scaled);
});

// Extract data and dispose the output
const result = await output.data();
output.dispose();

Manual Disposal for Async Inference

// tf.tidy does NOT work with async operations (executeAsync returns a Promise)
const input = tf.tensor4d(pixelData, [1, 224, 224, 3]);
const output = await model.executeAsync(input);

// Extract data
const resultData = await output.data();

// Manually dispose all tensors
input.dispose();
if (Array.isArray(output)) {
  output.forEach(t => t.dispose());
} else {
  output.dispose();
}

Monitoring Memory

// Check for memory leaks during development
console.log('Before inference:', tf.memory());
// { numTensors: 150, numDataBuffers: 150, numBytes: 25000000, ... }

const output = model.predict(input);
const data = await output.data();
output.dispose();
input.dispose();

console.log('After inference:', tf.memory());
// numTensors should return to approximately the same count

Error Handling

// Robust inference with error handling
async function runInference(model, inputData) {
  let input;
  let output;

  try {
    input = tf.tensor4d(inputData, [1, 224, 224, 3]);

    // Try synchronous prediction first
    try {
      output = model.predict(input);
    } catch (syncError) {
      // Fall back to async execution if predict() fails
      if (syncError.message.includes('dynamic ops') ||
          syncError.message.includes('executeAsync')) {
        console.warn('Model requires async execution, using executeAsync()');
        output = await model.executeAsync(input);
      } else {
        throw syncError;
      }
    }

    // Extract and return results
    const resultData = await (Array.isArray(output) ? output[0] : output).data();
    return Array.from(resultData);

  } catch (error) {
    console.error('Inference failed:', error.message);
    throw error;

  } finally {
    // Always clean up tensors
    if (input) input.dispose();
    if (output) {
      if (Array.isArray(output)) {
        output.forEach(t => t.dispose());
      } else {
        output.dispose();
      }
    }
  }
}

Performance Optimization

Technique Description Impact
tf.tidy() Wrap synchronous inference to auto-dispose intermediates Prevents memory leaks
Warm-up run Run a single inference with dummy data after loading Triggers JIT compilation; subsequent runs are faster
Batch inference Process multiple inputs in a single predict() call Better GPU utilization
WebGL backend Use tf.setBackend('webgl') for GPU acceleration 10-100x faster than CPU for large models
WASM backend Use tf.setBackend('wasm') for CPU fallback Faster than default JS CPU backend
Input reuse Reuse input tensors when shape is constant (e.g., video frames) Reduces allocation overhead
// Warm-up run to trigger JIT compilation
const warmupInput = tf.zeros([1, 224, 224, 3]);
const warmupOutput = model.predict(warmupInput);
warmupOutput.dispose();
warmupInput.dispose();
console.log('Model warmed up, ready for fast inference');

Complete End-to-End Example

// Full example: load model, preprocess image, run inference, display results

async function classifyImage(imageUrl) {
  // 1. Load the model (cache for reuse)
  const model = await tf.loadGraphModel(
    'https://storage.googleapis.com/my-models/mobilenet/v2/model.json',
    {
      onProgress: (p) => console.log(`Loading: ${(p * 100).toFixed(0)}%`)
    }
  );

  // 2. Load and preprocess the image
  const img = new Image();
  img.crossOrigin = 'anonymous';
  await new Promise((resolve) => { img.onload = resolve; img.src = imageUrl; });

  // 3. Run inference with memory management
  const predictions = tf.tidy(() => {
    const tensor = tf.browser.fromPixels(img)
      .resizeBilinear([224, 224])
      .div(255.0)
      .expandDims(0);
    return model.predict(tensor);
  });

  // 4. Extract and process results
  const probabilities = await predictions.data();
  predictions.dispose();

  // 5. Find top-5 predictions
  const top5 = Array.from(probabilities)
    .map((prob, idx) => ({ probability: prob, classIndex: idx }))
    .sort((a, b) => b.probability - a.probability)
    .slice(0, 5);

  console.log('Top 5 predictions:');
  top5.forEach(({ probability, classIndex }) => {
    console.log(`  Class ${classIndex}: ${(probability * 100).toFixed(2)}%`);
  });

  return top5;
}

See Also

Environments

Heuristics

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment