Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Debug Disable Quant GEMM

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, PyTorch, Debug, Quantization
Last Updated 2026-02-07 14:00 GMT

Overview

Debug feature that disables quantization for specific GEMM operations (fprop, dgrad, wgrad), forcing them to execute in high precision.

Description

DisableQuantizationGEMM is a debug feature that selectively disables quantized GEMM execution for specified operations. When enabled for a GEMM, it forces that operation to run in high precision instead of FP8/NVFP4. It inherits from TEConfigAPIMapper for config-based GEMM/tensor routing, and overrides the fp8_gemm_enabled API to return False for the matched GEMMs.

Usage

Enable via YAML config, specifying which GEMMs to run in high precision. Useful for debugging numerical issues in specific GEMM operations during quantized training.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/debug/features/disable_quantization_gemm.py
Lines
1--59

Signature

@Registry.register_feature(namespace="transformer_engine")
class DisableQuantizationGEMM(TEConfigAPIMapper):
    def fp8_gemm_enabled(self, config, layer_name, gemm, iteration) -> Tuple[bool, int]: ...

Import

from transformer_engine.debug.features.disable_quantization_gemm import DisableQuantizationGEMM

I/O Contract

Inputs

Name Type Required Description
config Dict Yes Must contain gemms list specifying which GEMMs to disable
layer_name str Yes Name of the TE layer
gemm str Yes One of fprop, dgrad, wgrad
iteration int Yes Current training step

Outputs

Name Type Description
result Tuple[bool, int] Returns (False, iteration + 1) to disable quantized GEMM

Usage Examples

# YAML configuration:
# example_disable_quantization_gemm:
#   enabled: True
#   layers:
#     layer_types: [fc1]
#   transformer_engine:
#     DisableQuantizationGEMM:
#       enabled: True
#       gemms: [dgrad, wgrad]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment