Heuristic:Tensorflow Tfjs GPU Pipeline Data Residency
| Metadata | |
|---|---|
| Source | Doc |
| Domains | Optimization, GPU_Pipeline |
| Date | 2026-02-10 |
Overview
Keep tensor data on GPU throughout the pipeline using dataToGPU() to avoid costly CPU-GPU transfers that add significant latency.
Description
In a typical ML pipeline, data transfers between CPU and GPU are a major bottleneck. TensorFlow.js provides tensor.dataToGPU() to access GPU-resident data directly (WebGL textures or WebGPU buffers) without downloading to CPU. This enables zero-copy integration with downstream WebGL/WebGPU rendering or processing steps.
Usage
Use this heuristic when building real-time pipelines (video processing, AR, live inference) where output tensors feed into custom GPU rendering code. Critical for achieving 60fps.
The Insight
- Action: Use tensor.dataToGPU() instead of tensor.data() or tensor.dataSync() when downstream processing is GPU-based
- Value: Eliminates CPU-GPU synchronization overhead. For WebGL: returns {texture, texShape, tensorRef}. For WebGPU: returns {buffer, bufSize, tensorRef}.
- Trade-off: Data format is backend-specific. WebGL textures are densely packed (RGBA channels). Must dispose tensorRef manually to prevent memory leaks.
- Note: For image-shaped tensors [height, width, 4], WebGL texture aligns with image storage for zero-cost downstream use
Reasoning
From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md:
"General rule of thumb is that if the whole pipeline can run on GPU without downloading any data to CPU, it is usually much faster than a fragmented pipeline that requires data transfer between CPU and GPU. This additional time for GPU to CPU and CPU to GPU sync adds to the pipeline latency."
Pipeline Comparison
| Pipeline Type | Description | Latency Impact |
|---|---|---|
| Pure GPU | All operations remain on GPU | Minimal (no sync overhead) |
| Fragmented | Data moves between CPU and GPU | High (sync + transfer overhead) |
| CPU-only | All operations on CPU | Predictable but slower throughput |
Code Evidence
From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md
// Get GPU texture directly from tensor (WebGL backend)
const data = tensor.dataToGPU({customTexShape: [videoHeight, videoWidth]});
// Use the texture in a custom WebGL rendering pipeline
gl.bindTexture(gl.TEXTURE_2D, data.texture);
// ... custom WebGL processing ...
// CRITICAL: Dispose the tensor reference to prevent memory leaks
data.tensorRef.dispose();
WebGPU Buffer Access
// Get GPU buffer directly from tensor (WebGPU backend)
const gpuData = tensor.dataToGPU();
// Use the buffer in a custom WebGPU compute pipeline
const {buffer, bufSize, tensorRef} = gpuData;
// ... bind buffer to WebGPU pipeline ...
// CRITICAL: Dispose when done
tensorRef.dispose();
Anti-Pattern: Unnecessary CPU Roundtrip
// BAD: Downloads to CPU then re-uploads to GPU
const cpuData = await tensor.data(); // GPU -> CPU transfer
const texture = createTextureFromArray(cpuData); // CPU -> GPU transfer
// GOOD: Stays on GPU the entire time
const gpuData = tensor.dataToGPU();
gl.bindTexture(gl.TEXTURE_2D, gpuData.texture); // Zero-copy GPU access