Implementation:NVIDIA TransformerEngine Debug Log Tensor Stats

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, PyTorch, Debug, Quantization
Last Updated	2026-02-07 14:00 GMT

Overview

Logs basic statistics of high-precision tensors (min, max, mean, std, norms, dynamic range, blockwise dynamic range) within the Transformer Engine debug framework.

Description

LogTensorStats collects and logs statistics on high-precision (non-FP8) tensors during training. It supports a wide range of statistics: min, max, mean, std, l1_norm, l2_norm, cur_amax, dynamic_range, and max_blockwise_dynamic_range (with configurable block size and dimensionality). The feature validates that tensors are in high precision and raises errors if FP8 tensors are provided (users should use LogFp8TensorStats instead). Statistics are accumulated for micro-batches and flushed during debug_api.step().

Usage

Enable via YAML config under transformer_engine.LogTensorStats. Supports per-tensor configuration of stats, frequency, and step ranges. Use freq > 1 to reduce overhead.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/debug/features/log_tensor_stats.py
Lines: 1--236

Signature

@Registry.register_feature(namespace="transformer_engine")
class LogTensorStats(BaseLogTensorStats):
    def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Tuple[bool, Optional[int]]: ...
    def inspect_tensor(self, config, layer_name, tensor_name, iteration, tp_group, tensor, rowwise_quantized_tensor=None, columnwise_quantized_tensor=None, quantizer=None) -> None: ...

Import

from transformer_engine.debug.features.log_tensor_stats import LogTensorStats

I/O Contract

Inputs

Name	Type	Required	Description
config	Dict	Yes	Must contain `stats` list; optionally `freq`, `start_step`, `end_step`, `start_end_list`
tensor	torch.Tensor	Yes	High-precision tensor (must not be FP8)
tensor_name	str	Yes	One of `activation`, `weight`, `gradient`, `output`, `wgrad`, `dgrad`

Outputs

Name	Type	Description
(none)	None	Statistics are buffered and logged at the next `debug_api.step()`

Usage Examples

# YAML configuration:
# example_tensor_stat_collection:
#   enabled: True
#   layers:
#     layer_name_regex_pattern: .*(fc1|self_attention).*
#   transformer_engine:
#     LogTensorStats:
#       enabled: True
#       tensors_struct:
#         - tensor: activation
#           stats: [mean]
#           freq: 10
#           start_step: 5
#           end_step: 100
#         - tensor: gradient
#           stats: [mean, max, min]
#           freq: 2
#         - tensor: weight
#           stats: [dynamic_range]

Related Pages

Environment:NVIDIA_TransformerEngine_CUDA_Toolkit_Requirements

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment