Implementation:NVIDIA TransformerEngine Debug Disable Quant GEMM
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Debug, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Debug feature that disables quantization for specific GEMM operations (fprop, dgrad, wgrad), forcing them to execute in high precision.
Description
DisableQuantizationGEMM is a debug feature that selectively disables quantized GEMM execution for specified operations. When enabled for a GEMM, it forces that operation to run in high precision instead of FP8/NVFP4. It inherits from TEConfigAPIMapper for config-based GEMM/tensor routing, and overrides the fp8_gemm_enabled API to return False for the matched GEMMs.
Usage
Enable via YAML config, specifying which GEMMs to run in high precision. Useful for debugging numerical issues in specific GEMM operations during quantized training.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/debug/features/disable_quantization_gemm.py- Lines
- 1--59
Signature
@Registry.register_feature(namespace="transformer_engine")
class DisableQuantizationGEMM(TEConfigAPIMapper):
def fp8_gemm_enabled(self, config, layer_name, gemm, iteration) -> Tuple[bool, int]: ...
Import
from transformer_engine.debug.features.disable_quantization_gemm import DisableQuantizationGEMM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Dict | Yes | Must contain gemms list specifying which GEMMs to disable
|
| layer_name | str | Yes | Name of the TE layer |
| gemm | str | Yes | One of fprop, dgrad, wgrad
|
| iteration | int | Yes | Current training step |
Outputs
| Name | Type | Description |
|---|---|---|
| result | Tuple[bool, int] | Returns (False, iteration + 1) to disable quantized GEMM
|
Usage Examples
# YAML configuration:
# example_disable_quantization_gemm:
# enabled: True
# layers:
# layer_types: [fc1]
# transformer_engine:
# DisableQuantizationGEMM:
# enabled: True
# gemms: [dgrad, wgrad]