Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Debug Log Tensor Stats

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, PyTorch, Debug, Quantization
Last Updated 2026-02-07 14:00 GMT

Overview

Logs basic statistics of high-precision tensors (min, max, mean, std, norms, dynamic range, blockwise dynamic range) within the Transformer Engine debug framework.

Description

LogTensorStats collects and logs statistics on high-precision (non-FP8) tensors during training. It supports a wide range of statistics: min, max, mean, std, l1_norm, l2_norm, cur_amax, dynamic_range, and max_blockwise_dynamic_range (with configurable block size and dimensionality). The feature validates that tensors are in high precision and raises errors if FP8 tensors are provided (users should use LogFp8TensorStats instead). Statistics are accumulated for micro-batches and flushed during debug_api.step().

Usage

Enable via YAML config under transformer_engine.LogTensorStats. Supports per-tensor configuration of stats, frequency, and step ranges. Use freq > 1 to reduce overhead.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/debug/features/log_tensor_stats.py
Lines
1--236

Signature

@Registry.register_feature(namespace="transformer_engine")
class LogTensorStats(BaseLogTensorStats):
    def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Tuple[bool, Optional[int]]: ...
    def inspect_tensor(self, config, layer_name, tensor_name, iteration, tp_group, tensor, rowwise_quantized_tensor=None, columnwise_quantized_tensor=None, quantizer=None) -> None: ...

Import

from transformer_engine.debug.features.log_tensor_stats import LogTensorStats

I/O Contract

Inputs

Name Type Required Description
config Dict Yes Must contain stats list; optionally freq, start_step, end_step, start_end_list
tensor torch.Tensor Yes High-precision tensor (must not be FP8)
tensor_name str Yes One of activation, weight, gradient, output, wgrad, dgrad

Outputs

Name Type Description
(none) None Statistics are buffered and logged at the next debug_api.step()

Usage Examples

# YAML configuration:
# example_tensor_stat_collection:
#   enabled: True
#   layers:
#     layer_name_regex_pattern: .*(fc1|self_attention).*
#   transformer_engine:
#     LogTensorStats:
#       enabled: True
#       tensors_struct:
#         - tensor: activation
#           stats: [mean]
#           freq: 10
#           start_step: 5
#           end_step: 100
#         - tensor: gradient
#           stats: [mean, max, min]
#           freq: 2
#         - tensor: weight
#           stats: [dynamic_range]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment