Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Tensorflow Tfjs GPU Pipeline Data Residency

From Leeroopedia



Metadata
Source Doc
Domains Optimization, GPU_Pipeline
Date 2026-02-10

Overview

Keep tensor data on GPU throughout the pipeline using dataToGPU() to avoid costly CPU-GPU transfers that add significant latency.

Description

In a typical ML pipeline, data transfers between CPU and GPU are a major bottleneck. TensorFlow.js provides tensor.dataToGPU() to access GPU-resident data directly (WebGL textures or WebGPU buffers) without downloading to CPU. This enables zero-copy integration with downstream WebGL/WebGPU rendering or processing steps.

Usage

Use this heuristic when building real-time pipelines (video processing, AR, live inference) where output tensors feed into custom GPU rendering code. Critical for achieving 60fps.

The Insight

  • Action: Use tensor.dataToGPU() instead of tensor.data() or tensor.dataSync() when downstream processing is GPU-based
  • Value: Eliminates CPU-GPU synchronization overhead. For WebGL: returns {texture, texShape, tensorRef}. For WebGPU: returns {buffer, bufSize, tensorRef}.
  • Trade-off: Data format is backend-specific. WebGL textures are densely packed (RGBA channels). Must dispose tensorRef manually to prevent memory leaks.
  • Note: For image-shaped tensors [height, width, 4], WebGL texture aligns with image storage for zero-cost downstream use

Reasoning

From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md:

"General rule of thumb is that if the whole pipeline can run on GPU without downloading any data to CPU, it is usually much faster than a fragmented pipeline that requires data transfer between CPU and GPU. This additional time for GPU to CPU and CPU to GPU sync adds to the pipeline latency."

Pipeline Comparison

Pipeline Type Description Latency Impact
Pure GPU All operations remain on GPU Minimal (no sync overhead)
Fragmented Data moves between CPU and GPU High (sync + transfer overhead)
CPU-only All operations on CPU Predictable but slower throughput

Code Evidence

From docs/OPTIMIZATION_PURE_GPU_PIPELINE.md

// Get GPU texture directly from tensor (WebGL backend)
const data = tensor.dataToGPU({customTexShape: [videoHeight, videoWidth]});

// Use the texture in a custom WebGL rendering pipeline
gl.bindTexture(gl.TEXTURE_2D, data.texture);
// ... custom WebGL processing ...

// CRITICAL: Dispose the tensor reference to prevent memory leaks
data.tensorRef.dispose();

WebGPU Buffer Access

// Get GPU buffer directly from tensor (WebGPU backend)
const gpuData = tensor.dataToGPU();

// Use the buffer in a custom WebGPU compute pipeline
const {buffer, bufSize, tensorRef} = gpuData;
// ... bind buffer to WebGPU pipeline ...

// CRITICAL: Dispose when done
tensorRef.dispose();

Anti-Pattern: Unnecessary CPU Roundtrip

// BAD: Downloads to CPU then re-uploads to GPU
const cpuData = await tensor.data();  // GPU -> CPU transfer
const texture = createTextureFromArray(cpuData);  // CPU -> GPU transfer

// GOOD: Stays on GPU the entire time
const gpuData = tensor.dataToGPU();
gl.bindTexture(gl.TEXTURE_2D, gpuData.texture);  // Zero-copy GPU access

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment