Heuristic:Tensorflow Tfjs WebGL Shader Warmup
| Metadata | |
|---|---|
| Source | Doc |
| Domains | Optimization, WebGL |
| Date | 2026-02-10 |
Overview
First inference call in WebGL backend is significantly slower due to shader compilation; warm up the cache with a dummy prediction after model loading.
Description
When using the WebGL backend, the first call to predict() compiles WebGL shader programs for each operation in the model. Subsequent calls reuse cached shaders and are much faster. This "cold start" can make the first inference 5-10x slower than steady state.
Usage
Apply this heuristic when latency of the first inference matters (e.g., interactive applications, demos). Not needed for batch processing where first-call overhead is amortized.
The Insight
- Action: Call model.predict(tf.zeros([1, ...inputShape])) immediately after loading the model to warm up the shader cache
- Value: Eliminates shader compilation latency from the first real inference
- Trade-off: Small additional startup time (one dummy inference), but makes first real prediction fast
- Also applies to: WebGPU backend (pipeline compilation) and the ENGINE_COMPILE_ONLY flag for pre-compilation
Reasoning
From tfjs-converter/README.md FAQ #5:
"The time of first call also includes the compilation time of WebGL shader programs for the model. After the first call the shader programs are cached, which makes the subsequent calls much faster. You can warm up the cache by calling the predict method with an all zero inputs, right after the completion of the model loading."
Code Evidence
Shader Warmup Pattern
// Load the model
const model = await tf.loadGraphModel('model/model.json');
// Warm up the shader cache with a dummy prediction
const warmupResult = model.predict(tf.zeros([1, 224, 224, 3]));
// Dispose the warmup result
if (Array.isArray(warmupResult)) {
warmupResult.forEach(t => t.dispose());
} else {
warmupResult.dispose();
}
// Now the first real prediction will be fast
const realResult = model.predict(realInput);
ENGINE_COMPILE_ONLY Flag (Advanced)
// Pre-compile shaders without executing operations
tf.env().set('ENGINE_COMPILE_ONLY', true);
model.predict(tf.zeros([1, 224, 224, 3]));
tf.env().set('ENGINE_COMPILE_ONLY', false);