Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Tensorflow Tfjs WebGL Shader Warmup

From Leeroopedia




Metadata
Source Doc
Domains Optimization, WebGL
Date 2026-02-10

Overview

First inference call in WebGL backend is significantly slower due to shader compilation; warm up the cache with a dummy prediction after model loading.

Description

When using the WebGL backend, the first call to predict() compiles WebGL shader programs for each operation in the model. Subsequent calls reuse cached shaders and are much faster. This "cold start" can make the first inference 5-10x slower than steady state.

Usage

Apply this heuristic when latency of the first inference matters (e.g., interactive applications, demos). Not needed for batch processing where first-call overhead is amortized.

The Insight

  • Action: Call model.predict(tf.zeros([1, ...inputShape])) immediately after loading the model to warm up the shader cache
  • Value: Eliminates shader compilation latency from the first real inference
  • Trade-off: Small additional startup time (one dummy inference), but makes first real prediction fast
  • Also applies to: WebGPU backend (pipeline compilation) and the ENGINE_COMPILE_ONLY flag for pre-compilation

Reasoning

From tfjs-converter/README.md FAQ #5:

"The time of first call also includes the compilation time of WebGL shader programs for the model. After the first call the shader programs are cached, which makes the subsequent calls much faster. You can warm up the cache by calling the predict method with an all zero inputs, right after the completion of the model loading."

Code Evidence

Shader Warmup Pattern

// Load the model
const model = await tf.loadGraphModel('model/model.json');

// Warm up the shader cache with a dummy prediction
const warmupResult = model.predict(tf.zeros([1, 224, 224, 3]));

// Dispose the warmup result
if (Array.isArray(warmupResult)) {
  warmupResult.forEach(t => t.dispose());
} else {
  warmupResult.dispose();
}

// Now the first real prediction will be fast
const realResult = model.predict(realInput);

ENGINE_COMPILE_ONLY Flag (Advanced)

// Pre-compile shaders without executing operations
tf.env().set('ENGINE_COMPILE_ONLY', true);
model.predict(tf.zeros([1, 224, 224, 3]));
tf.env().set('ENGINE_COMPILE_ONLY', false);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment