Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bitsandbytes foundation Bitsandbytes Int8 Vectorwise Quant

From Leeroopedia


Metadata

Field Value
Sources Repo: bitsandbytes, Paper: LLM.int8()
Domains Quantization
Type API Doc
Last updated 2026-02-07 14:00 GMT

Overview

Concrete tool for quantizing tensors to INT8 using per-row scaling provided by the bitsandbytes library.

Description

int8_vectorwise_quant quantizes a torch.float16 tensor to torch.int8 with per-row scaling factors, implementing the vectorwise quantization step of the LLM.int8() algorithm. The function dispatches to a native CUDA kernel via torch.ops.bitsandbytes.int8_vectorwise_quant.default.

When the threshold parameter is set to a value greater than 0, the function also performs outlier decomposition: it identifies columns where any element exceeds the threshold in absolute value, suppresses those columns in the quantized output (sets them to zero), and returns the column indices separately so the caller can handle them in FP16.

When threshold is 0 (the default), no outlier detection is performed and the third element of the return tuple is None.

Code Reference

  • Source: bitsandbytes repo
  • File: bitsandbytes/functional.py, Lines L1944-1962
  • Import:
from bitsandbytes.functional import int8_vectorwise_quant
  • Signature:
def int8_vectorwise_quant(
    A: torch.Tensor,
    threshold: float = 0.0,
) -> tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]:

I/O Contract

Inputs

Parameter Type Required Default Description
A torch.Tensor (dtype torch.float16) Yes -- The input tensor to quantize. Must have dtype float16.
threshold float No 0.0 Outlier detection threshold. When 0.0, no outlier decomposition is performed. When > 0, columns with values exceeding this threshold are identified and returned separately.

Outputs

Index Type Description
0 torch.Tensor (dtype torch.int8) The quantized data. Same shape as input. Outlier columns are zeroed out when threshold > 0.
1 torch.Tensor (dtype torch.float32) row|) / 127.
2 torch.Tensor (dtype torch.int32) or None Column indices of outlier features. None when threshold=0.0.

Usage Examples

Quantize a tensor without outlier detection:

import torch
from bitsandbytes.functional import int8_vectorwise_quant

# Create a float16 tensor
A = torch.randn(4, 8, dtype=torch.float16, device="cuda")

# Quantize to INT8 with per-row scaling
quantized, scales, outlier_cols = int8_vectorwise_quant(A)

print(quantized.dtype)       # torch.int8
print(quantized.shape)       # torch.Size([4, 8])
print(scales.dtype)          # torch.float32
print(scales.shape)          # torch.Size([4])
print(outlier_cols)          # None (no outlier detection)

Quantize a tensor with outlier decomposition:

import torch
from bitsandbytes.functional import int8_vectorwise_quant

# Create a tensor with some large outlier values
A = torch.randn(4, 8, dtype=torch.float16, device="cuda")
A[0, 2] = 10.0  # Inject an outlier in column 2
A[1, 5] = -8.0  # Inject an outlier in column 5

# Quantize with outlier threshold of 6.0
quantized, scales, outlier_cols = int8_vectorwise_quant(A, threshold=6.0)

print(quantized.dtype)       # torch.int8
print(scales.dtype)          # torch.float32
print(outlier_cols.dtype)    # torch.int32
print(outlier_cols)          # tensor([2, 5], dtype=torch.int32) - outlier column indices
# Note: quantized[:, outlier_cols] will be zeroed out

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment