Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Bitsandbytes foundation Bitsandbytes SwitchBack Quantized Linear

From Leeroopedia
Revision as of 18:02, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Bitsandbytes_foundation_Bitsandbytes_SwitchBack_Quantized_Linear.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Quantization, INT8, Training
Last Updated 2026-02-07 13:31 GMT

Overview

An INT8 quantized linear layer technique that uses different quantization granularities (global vs vector-wise) in the forward pass and switches back to standard precision for weight gradient computation.

Description

The SwitchBack approach performs quantized INT8 matrix multiplication in the forward pass for the linear transformation Y = X @ W^T, but "switches back" to standard-precision computation for the weight gradient dW = G^T @ X in the backward pass. This hybrid strategy is motivated by the observation that weight gradients are more sensitive to quantization noise than activations. Two quantization strategies are supported for the forward pass: global (single scaling factor per tensor for weights) and vector-wise (per-row scaling factors for both activations and weights). A memory-efficient variant saves quantized activations instead of full-precision during forward, trading backward compute for memory savings.

Usage

Apply this principle when training models where memory reduction from INT8 forward passes is desired but weight gradient quality must be preserved. It is a middle ground between full-precision training and fully-quantized training.

Theoretical Basis

Forward pass (quantized):

X_int8, scale_X = quantize_rowwise(X)
W_int8, scale_W = quantize_global(W)  # or quantize_rowwise
Y = int8_matmul_dequantize(X_int8, W_int8.T, scale_X, scale_W)

Backward pass (mixed):

# Gradient w.r.t. input: quantized
dX = int8_matmul_dequantize(dY_int8, W_int8, ...)

# Gradient w.r.t. weight: STANDARD precision ("switch back")
dW = matmul(dY.T, X)  # full-precision matmul

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment