Heuristic:Tensorflow Tfjs GPU Pipeline Data Residency

Metadata
Source	Doc
Domains	Optimization, GPU_Pipeline
Date	2026-02-10

Overview

Keep tensor data on GPU throughout the pipeline using dataToGPU() to avoid costly CPU-GPU transfers that add significant latency.

Description

In a typical ML pipeline, data transfers between CPU and GPU are a major bottleneck. TensorFlow.js provides tensor.dataToGPU() to access GPU-resident data directly (WebGL textures or WebGPU buffers) without downloading to CPU. This enables zero-copy integration with downstream WebGL/WebGPU rendering or processing steps.

Usage

Use this heuristic when building real-time pipelines (video processing, AR, live inference) where output tensors feed into custom GPU rendering code. Critical for achieving 60fps.

The Insight

Action: Use tensor.dataToGPU() instead of tensor.data() or tensor.dataSync() when downstream processing is GPU-based
Value: Eliminates CPU-GPU synchronization overhead. For WebGL: returns {texture, texShape, tensorRef}. For WebGPU: returns {buffer, bufSize, tensorRef}.
Trade-off: Data format is backend-specific. WebGL textures are densely packed (RGBA channels). Must dispose tensorRef manually to prevent memory leaks.
Note: For image-shaped tensors [height, width, 4], WebGL texture aligns with image storage for zero-cost downstream use

Reasoning

From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md:

"General rule of thumb is that if the whole pipeline can run on GPU without downloading any data to CPU, it is usually much faster than a fragmented pipeline that requires data transfer between CPU and GPU. This additional time for GPU to CPU and CPU to GPU sync adds to the pipeline latency."

Pipeline Comparison

Pipeline Type	Description	Latency Impact
Pure GPU	All operations remain on GPU	Minimal (no sync overhead)
Fragmented	Data moves between CPU and GPU	High (sync + transfer overhead)
CPU-only	All operations on CPU	Predictable but slower throughput

Code Evidence

From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md

// Get GPU texture directly from tensor (WebGL backend)
const data = tensor.dataToGPU({customTexShape: [videoHeight, videoWidth]});

// Use the texture in a custom WebGL rendering pipeline
gl.bindTexture(gl.TEXTURE_2D, data.texture);
// ... custom WebGL processing ...

// CRITICAL: Dispose the tensor reference to prevent memory leaks
data.tensorRef.dispose();

WebGPU Buffer Access

// Get GPU buffer directly from tensor (WebGPU backend)
const gpuData = tensor.dataToGPU();

// Use the buffer in a custom WebGPU compute pipeline
const {buffer, bufSize, tensorRef} = gpuData;
// ... bind buffer to WebGPU pipeline ...

// CRITICAL: Dispose when done
tensorRef.dispose();

Anti-Pattern: Unnecessary CPU Roundtrip

// BAD: Downloads to CPU then re-uploads to GPU
const cpuData = await tensor.data();  // GPU -> CPU transfer
const texture = createTextureFromArray(cpuData);  // CPU -> GPU transfer

// GOOD: Stays on GPU the entire time
const gpuData = tensor.dataToGPU();
gl.bindTexture(gl.TEXTURE_2D, gpuData.texture);  // Zero-copy GPU access

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment