Implementation:NVIDIA TransformerEngine Debug API

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, PyTorch, Debug, Quantization
Last Updated	2026-02-07 14:00 GMT

Overview

Defines the core debug API classes for the nvidia-dlframework-inspect integration in Transformer Engine, providing config parsing, feature routing, and default feature behavior for tensor inspection and modification during quantized training.

Description

This module implements the central API surface for Transformer Engine's debug/inspection framework. It contains three main components:

TEConfigAPIMapper -- Parses YAML configuration and determines which NV DLFW Inspect API should be invoked for each tensor and GEMM combination. It supports both gemms_struct (per-GEMM config) and gemms (list of GEMM names) config formats, with optional tensor-level filtering.

TEDefaultFeatures -- Provides the default (no-op) implementations for all debug API calls: fp8_gemm_enabled, modify_tensor_enabled, modify_tensor, inspect_tensor, inspect_tensor_postquantize, and their _enabled routing counterparts. Features override these methods to inject custom behavior.

TransformerEngineAPI -- The registered namespace API class that wires together config routing, input/output assertions, multi-feature output merging, and lifecycle hooks (step(), end_debug()). It controls which features can be invoked simultaneously and how their results are combined.

Usage

This module is used internally by Transformer Engine when the nvdlfw_inspect debug system is initialized. Users configure features through YAML config files; this API layer handles parsing, routing, and dispatching to registered features such as LogTensorStats, FakeQuant, PerTensorScaling, and others.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/debug/features/api.py
Lines: 1--533

Signature

class TEConfigAPIMapper(BaseConfigAPIMapper):
    def parse_config_and_api(self, config, **kwargs) -> Tuple[bool, Optional[Dict]]: ...

class TEDefaultFeatures:
    def fp8_gemm_enabled(self, config, layer_name, gemm, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
    def modify_tensor_enabled(self, config, layer_name, gemm, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
    def modify_tensor(self, config, layer_name, gemm, tensor_name, tensor, default_quantizer, iteration, out) -> Union[torch.Tensor, QuantizedTensor, None]: ...
    def inspect_tensor(self, config, layer_name, tensor_name, tensor, rowwise_quantized_tensor, columnwise_quantized_tensor, quantizer, iteration, tp_group) -> None: ...
    def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...

@Registry.register_namespace_api(namespace="transformer_engine")
class TransformerEngineAPI(BaseNamespaceAPI):
    def step(self) -> None: ...
    def end_debug(self) -> None: ...

Import

from transformer_engine.debug.features.api import TransformerEngineAPI, TEConfigAPIMapper, TEDefaultFeatures

I/O Contract

Inputs

Name	Type	Required	Description
config	Dict	Yes	YAML-parsed configuration dictionary for the feature
layer_name	str	Yes	Name of the TE layer being inspected
gemm	str	Conditional	One of `fprop`, `dgrad`, `wgrad`
tensor_name	str	Conditional	One of `activation`, `weight`, `gradient`, `output`, `wgrad`, `dgrad`
iteration	int	Yes	Current training step (number of `debug_api.step()` calls)
tensor	torch.Tensor	Conditional	High-precision tensor for inspection or modification
default_quantizer	Quantizer	Conditional	Default quantizer used for the tensor if modify_tensor is not invoked

Outputs

Name	Type	Description
result	Union[bool, Tuple[bool, Optional[int]]]	For `_enabled` APIs: whether the feature is active and the next enabled iteration
tensor	Union[torch.Tensor, QuantizedTensor, None]	For `modify_tensor`: the processed tensor or None when using out parameter

Usage Examples

# Features are registered and invoked through the nvdlfw_inspect config system.
# In a YAML config file:
# transformer_engine:
#   LogTensorStats:
#     enabled: True
#     tensors_struct:
#       - tensor: activation
#         stats: [mean, max]
#         freq: 10

# Programmatic usage through the debug API:
import nvdlfw_inspect.api as debug_api
debug_api.initialize(config="config.yaml")

# After each training step:
debug_api.step()

# At the end of debugging:
debug_api.end_debug()

Related Pages

Environment:NVIDIA_TransformerEngine_CUDA_Toolkit_Requirements

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment