Implementation:NVIDIA TransformerEngine Float8Blockwise Storage
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Mixin storage class that holds the raw data attributes (rowwise/columnwise quantized data and per-block scale inverses) for Float8BlockwiseQTensor, separating data management from tensor subclass logic.
Description
Stores _rowwise_data, _columnwise_data, _rowwise_scale_inv, _columnwise_scale_inv, the FP8 dtype, quantizer reference, and a 2D-scaled flag. Provides get_metadata(), prepare_for_saving()/restore_from_saved() for autograd context management, clear() for memory deallocation, and _dequantize_vectorwise() for converting quantized data back to higher precision by applying block-level inverse scales.
Usage
The data-holding layer of the blockwise FP8 tensor hierarchy. Can be instantiated directly with lower CPU overhead for performance-critical internal operations like GEMM data passing.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/pytorch/tensor/storage/float8_blockwise_tensor_storage.py- Lines
- 1--389
Signature
class Float8BlockwiseQTensorStorage(QuantizedTensorStorage):
_rowwise_data: Optional[torch.Tensor]
_columnwise_data: Optional[torch.Tensor]
_rowwise_scale_inv: Optional[torch.Tensor]
_columnwise_scale_inv: Optional[torch.Tensor]
_fp8_dtype: torch.dtype
_is_2D_scaled: bool
def get_metadata(self) -> dict: ...
def prepare_for_saving(self) -> list: ...
def restore_from_saved(self, tensors) -> None: ...
def clear(self) -> None: ...
def _dequantize_vectorwise(self, dtype=None) -> torch.Tensor: ...
Import
from transformer_engine.pytorch.tensor.storage.float8_blockwise_tensor_storage import (
Float8BlockwiseQTensorStorage,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| rowwise_data | torch.Tensor |
No | Rowwise quantized FP8 data |
| columnwise_data | torch.Tensor |
No | Columnwise quantized FP8 data |
| rowwise_scale_inv | torch.Tensor |
No | Per-block inverse scales for rowwise data |
| columnwise_scale_inv | torch.Tensor |
No | Per-block inverse scales for columnwise data |
Outputs
| Name | Type | Description |
|---|---|---|
| dequantized | torch.Tensor |
High-precision tensor reconstructed from block-scaled FP8 data |
Usage Examples
# Float8BlockwiseQTensorStorage is typically used as a mixin base class
# rather than instantiated directly
from transformer_engine.pytorch.tensor.float8_blockwise_tensor import Float8BlockwiseQTensor
# Access storage attributes through the tensor
fp8_block_tensor = quantizer.quantize(input_tensor)
row_data = fp8_block_tensor._rowwise_data
row_scales = fp8_block_tensor._rowwise_scale_inv