Heuristic:Tensorflow Tfjs WebGL Shader Warmup

Metadata
Source	Doc
Domains	Optimization, WebGL
Date	2026-02-10

Overview

First inference call in WebGL backend is significantly slower due to shader compilation; warm up the cache with a dummy prediction after model loading.

Description

When using the WebGL backend, the first call to predict() compiles WebGL shader programs for each operation in the model. Subsequent calls reuse cached shaders and are much faster. This "cold start" can make the first inference 5-10x slower than steady state.

Usage

Apply this heuristic when latency of the first inference matters (e.g., interactive applications, demos). Not needed for batch processing where first-call overhead is amortized.

The Insight

Action: Call model.predict(tf.zeros([1, ...inputShape])) immediately after loading the model to warm up the shader cache
Value: Eliminates shader compilation latency from the first real inference
Trade-off: Small additional startup time (one dummy inference), but makes first real prediction fast
Also applies to: WebGPU backend (pipeline compilation) and the ENGINE_COMPILE_ONLY flag for pre-compilation

Reasoning

From tfjs-converter/README.md FAQ #5:

"The time of first call also includes the compilation time of WebGL shader programs for the model. After the first call the shader programs are cached, which makes the subsequent calls much faster. You can warm up the cache by calling the predict method with an all zero inputs, right after the completion of the model loading."

Code Evidence

Shader Warmup Pattern

// Load the model
const model = await tf.loadGraphModel('model/model.json');

// Warm up the shader cache with a dummy prediction
const warmupResult = model.predict(tf.zeros([1, 224, 224, 3]));

// Dispose the warmup result
if (Array.isArray(warmupResult)) {
  warmupResult.forEach(t => t.dispose());
} else {
  warmupResult.dispose();
}

// Now the first real prediction will be fast
const realResult = model.predict(realInput);

ENGINE_COMPILE_ONLY Flag (Advanced)

// Pre-compile shaders without executing operations
tf.env().set('ENGINE_COMPILE_ONLY', true);
model.predict(tf.zeros([1, 224, 224, 3]));
tf.env().set('ENGINE_COMPILE_ONLY', false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment