Implementation:NVIDIA TransformerEngine Float8Blockwise Storage

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, PyTorch, Quantization
Last Updated	2026-02-07 14:00 GMT

Overview

Mixin storage class that holds the raw data attributes (rowwise/columnwise quantized data and per-block scale inverses) for Float8BlockwiseQTensor, separating data management from tensor subclass logic.

Description

Stores _rowwise_data, _columnwise_data, _rowwise_scale_inv, _columnwise_scale_inv, the FP8 dtype, quantizer reference, and a 2D-scaled flag. Provides get_metadata(), prepare_for_saving()/restore_from_saved() for autograd context management, clear() for memory deallocation, and _dequantize_vectorwise() for converting quantized data back to higher precision by applying block-level inverse scales.

Usage

The data-holding layer of the blockwise FP8 tensor hierarchy. Can be instantiated directly with lower CPU overhead for performance-critical internal operations like GEMM data passing.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/pytorch/tensor/storage/float8_blockwise_tensor_storage.py
Lines: 1--389

Signature

class Float8BlockwiseQTensorStorage(QuantizedTensorStorage):
    _rowwise_data: Optional[torch.Tensor]
    _columnwise_data: Optional[torch.Tensor]
    _rowwise_scale_inv: Optional[torch.Tensor]
    _columnwise_scale_inv: Optional[torch.Tensor]
    _fp8_dtype: torch.dtype
    _is_2D_scaled: bool

    def get_metadata(self) -> dict: ...
    def prepare_for_saving(self) -> list: ...
    def restore_from_saved(self, tensors) -> None: ...
    def clear(self) -> None: ...
    def _dequantize_vectorwise(self, dtype=None) -> torch.Tensor: ...

Import

from transformer_engine.pytorch.tensor.storage.float8_blockwise_tensor_storage import (
    Float8BlockwiseQTensorStorage,
)

I/O Contract

Inputs

Name	Type	Required	Description
rowwise_data	`torch.Tensor`	No	Rowwise quantized FP8 data
columnwise_data	`torch.Tensor`	No	Columnwise quantized FP8 data
rowwise_scale_inv	`torch.Tensor`	No	Per-block inverse scales for rowwise data
columnwise_scale_inv	`torch.Tensor`	No	Per-block inverse scales for columnwise data

Outputs

Name	Type	Description
dequantized	`torch.Tensor`	High-precision tensor reconstructed from block-scaled FP8 data

Usage Examples

# Float8BlockwiseQTensorStorage is typically used as a mixin base class
# rather than instantiated directly
from transformer_engine.pytorch.tensor.float8_blockwise_tensor import Float8BlockwiseQTensor

# Access storage attributes through the tensor
fp8_block_tensor = quantizer.quantize(input_tensor)
row_data = fp8_block_tensor._rowwise_data
row_scales = fp8_block_tensor._rowwise_scale_inv

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment