Implementation:NVIDIA TransformerEngine Debug API
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Debug, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Defines the core debug API classes for the nvidia-dlframework-inspect integration in Transformer Engine, providing config parsing, feature routing, and default feature behavior for tensor inspection and modification during quantized training.
Description
This module implements the central API surface for Transformer Engine's debug/inspection framework. It contains three main components:
- TEConfigAPIMapper -- Parses YAML configuration and determines which NV DLFW Inspect API should be invoked for each tensor and GEMM combination. It supports both
gemms_struct(per-GEMM config) andgemms(list of GEMM names) config formats, with optional tensor-level filtering.
- TEDefaultFeatures -- Provides the default (no-op) implementations for all debug API calls:
fp8_gemm_enabled,modify_tensor_enabled,modify_tensor,inspect_tensor,inspect_tensor_postquantize, and their_enabledrouting counterparts. Features override these methods to inject custom behavior.
- TransformerEngineAPI -- The registered namespace API class that wires together config routing, input/output assertions, multi-feature output merging, and lifecycle hooks (
step(),end_debug()). It controls which features can be invoked simultaneously and how their results are combined.
Usage
This module is used internally by Transformer Engine when the nvdlfw_inspect debug system is initialized. Users configure features through YAML config files; this API layer handles parsing, routing, and dispatching to registered features such as LogTensorStats, FakeQuant, PerTensorScaling, and others.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/debug/features/api.py- Lines
- 1--533
Signature
class TEConfigAPIMapper(BaseConfigAPIMapper):
def parse_config_and_api(self, config, **kwargs) -> Tuple[bool, Optional[Dict]]: ...
class TEDefaultFeatures:
def fp8_gemm_enabled(self, config, layer_name, gemm, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
def modify_tensor_enabled(self, config, layer_name, gemm, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
def modify_tensor(self, config, layer_name, gemm, tensor_name, tensor, default_quantizer, iteration, out) -> Union[torch.Tensor, QuantizedTensor, None]: ...
def inspect_tensor(self, config, layer_name, tensor_name, tensor, rowwise_quantized_tensor, columnwise_quantized_tensor, quantizer, iteration, tp_group) -> None: ...
def inspect_tensor_enabled(self, config, layer_name, tensor_name, iteration) -> Union[bool, Tuple[bool, Optional[int]]]: ...
@Registry.register_namespace_api(namespace="transformer_engine")
class TransformerEngineAPI(BaseNamespaceAPI):
def step(self) -> None: ...
def end_debug(self) -> None: ...
Import
from transformer_engine.debug.features.api import TransformerEngineAPI, TEConfigAPIMapper, TEDefaultFeatures
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Dict | Yes | YAML-parsed configuration dictionary for the feature |
| layer_name | str | Yes | Name of the TE layer being inspected |
| gemm | str | Conditional | One of fprop, dgrad, wgrad
|
| tensor_name | str | Conditional | One of activation, weight, gradient, output, wgrad, dgrad
|
| iteration | int | Yes | Current training step (number of debug_api.step() calls)
|
| tensor | torch.Tensor | Conditional | High-precision tensor for inspection or modification |
| default_quantizer | Quantizer | Conditional | Default quantizer used for the tensor if modify_tensor is not invoked |
Outputs
| Name | Type | Description |
|---|---|---|
| result | Union[bool, Tuple[bool, Optional[int]]] | For _enabled APIs: whether the feature is active and the next enabled iteration
|
| tensor | Union[torch.Tensor, QuantizedTensor, None] | For modify_tensor: the processed tensor or None when using out parameter
|
Usage Examples
# Features are registered and invoked through the nvdlfw_inspect config system.
# In a YAML config file:
# transformer_engine:
# LogTensorStats:
# enabled: True
# tensors_struct:
# - tensor: activation
# stats: [mean, max]
# freq: 10
# Programmatic usage through the debug API:
import nvdlfw_inspect.api as debug_api
debug_api.initialize(config="config.yaml")
# After each training step:
debug_api.step()
# At the end of debugging:
debug_api.end_debug()