Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Debug API

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, PyTorch, Debug, Quantization
Last Updated 2026-02-07 14:00 GMT

Overview

Defines the core debug API classes for the nvidia-dlframework-inspect integration in Transformer Engine, providing config parsing, feature routing, and default feature behavior for tensor inspection and modification during quantized training.

Description

This module implements the central API surface for Transformer Engine's debug/inspection framework. It contains three main components:

  • TEConfigAPIMapper -- Parses YAML configuration and determines which NV DLFW Inspect API should be invoked for each tensor and GEMM combination. It supports both gemms_struct (per-GEMM config) and gemms (list of GEMM names) config formats, with optional tensor-level filtering.
  • TEDefaultFeatures -- Provides the default (no-op) implementations for all debug API calls: fp8_gemm_enabled, modify_tensor_enabled, modify_tensor, inspect_tensor, inspect_tensor_postquantize, and their _enabled routing counterparts. Features override these methods to inject custom behavior.
  • TransformerEngineAPI -- The registered namespace API class that wires together config routing, input/output assertions, multi-feature output merging, and lifecycle hooks (step(), end_debug()). It controls which features can be invoked simultaneously and how their results are combined.

Usage

This module is used internally by Transformer Engine when the nvdlfw_inspect debug system is initialized. Users configure features through YAML config files; this API layer handles parsing, routing, and dispatching to registered features such as LogTensorStats, FakeQuant, PerTensorScaling, and others.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/debug/features/api.py
Lines
1--533

Signature

class TEConfigAPIMapper(BaseConfigAPIMapper):
    def parse_config_and_api(self, config, **kwargs) -> Tuple[bool, Optional[Dict]]: ...

class TEDefaultFeatures:
    def fp8_gemm_enabled(self, config, layer_name, gemm, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
    def modify_tensor_enabled(self, config, layer_name, gemm, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
    def modify_tensor(self, config, layer_name, gemm, tensor_name, tensor, default_quantizer, iteration, out) -> Union[torch.Tensor, QuantizedTensor, None]: ...
    def inspect_tensor(self, config, layer_name, tensor_name, tensor, rowwise_quantized_tensor, columnwise_quantized_tensor, quantizer, iteration, tp_group) -> None: ...
    def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...

@Registry.register_namespace_api(namespace="transformer_engine")
class TransformerEngineAPI(BaseNamespaceAPI):
    def step(self) -> None: ...
    def end_debug(self) -> None: ...

Import

from transformer_engine.debug.features.api import TransformerEngineAPI, TEConfigAPIMapper, TEDefaultFeatures

I/O Contract

Inputs

Name Type Required Description
config Dict Yes YAML-parsed configuration dictionary for the feature
layer_name str Yes Name of the TE layer being inspected
gemm str Conditional One of fprop, dgrad, wgrad
tensor_name str Conditional One of activation, weight, gradient, output, wgrad, dgrad
iteration int Yes Current training step (number of debug_api.step() calls)
tensor torch.Tensor Conditional High-precision tensor for inspection or modification
default_quantizer Quantizer Conditional Default quantizer used for the tensor if modify_tensor is not invoked

Outputs

Name Type Description
result Union[bool, Tuple[bool, Optional[int]]] For _enabled APIs: whether the feature is active and the next enabled iteration
tensor Union[torch.Tensor, QuantizedTensor, None] For modify_tensor: the processed tensor or None when using out parameter

Usage Examples

# Features are registered and invoked through the nvdlfw_inspect config system.
# In a YAML config file:
# transformer_engine:
#   LogTensorStats:
#     enabled: True
#     tensors_struct:
#       - tensor: activation
#         stats: [mean, max]
#         freq: 10

# Programmatic usage through the debug API:
import nvdlfw_inspect.api as debug_api
debug_api.initialize(config="config.yaml")

# After each training step:
debug_api.step()

# At the end of debugging:
debug_api.end_debug()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment