Implementation:NVIDIA TransformerEngine Debug Log Tensor Stats
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Debug, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Logs basic statistics of high-precision tensors (min, max, mean, std, norms, dynamic range, blockwise dynamic range) within the Transformer Engine debug framework.
Description
LogTensorStats collects and logs statistics on high-precision (non-FP8) tensors during training. It supports a wide range of statistics: min, max, mean, std, l1_norm, l2_norm, cur_amax, dynamic_range, and max_blockwise_dynamic_range (with configurable block size and dimensionality). The feature validates that tensors are in high precision and raises errors if FP8 tensors are provided (users should use LogFp8TensorStats instead). Statistics are accumulated for micro-batches and flushed during debug_api.step().
Usage
Enable via YAML config under transformer_engine.LogTensorStats. Supports per-tensor configuration of stats, frequency, and step ranges. Use freq > 1 to reduce overhead.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/debug/features/log_tensor_stats.py- Lines
- 1--236
Signature
@Registry.register_feature(namespace="transformer_engine")
class LogTensorStats(BaseLogTensorStats):
def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Tuple[bool, Optional[int]]: ...
def inspect_tensor(self, config, layer_name, tensor_name, iteration, tp_group, tensor, rowwise_quantized_tensor=None, columnwise_quantized_tensor=None, quantizer=None) -> None: ...
Import
from transformer_engine.debug.features.log_tensor_stats import LogTensorStats
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Dict | Yes | Must contain stats list; optionally freq, start_step, end_step, start_end_list
|
| tensor | torch.Tensor | Yes | High-precision tensor (must not be FP8) |
| tensor_name | str | Yes | One of activation, weight, gradient, output, wgrad, dgrad
|
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | Statistics are buffered and logged at the next debug_api.step()
|
Usage Examples
# YAML configuration:
# example_tensor_stat_collection:
# enabled: True
# layers:
# layer_name_regex_pattern: .*(fc1|self_attention).*
# transformer_engine:
# LogTensorStats:
# enabled: True
# tensors_struct:
# - tensor: activation
# stats: [mean]
# freq: 10
# start_step: 5
# end_step: 100
# - tensor: gradient
# stats: [mean, max, min]
# freq: 2
# - tensor: weight
# stats: [dynamic_range]