Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Ops Fuser

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, PyTorch, Optimization
Last Updated 2026-02-07 14:00 GMT

Overview

Core operation fusion manager that orchestrates a pipeline of fusible operations with automatic forward/backward fusion discovery and application.

Description

OperationFuser takes a sequence of BasicOperations and applies registered fusion functions to produce optimized forward and backward operation lists. _OperationFuserAutogradFunction implements a custom torch.autograd.Function that runs the fused forward pipeline and reconstructs the backward pipeline. The forward pass iterates over fused forward ops, passing quantizer information between adjacent operations. The backward pass reverses the operation order, applies backward fusions, and distributes gradients to parameters and extra outputs. register_forward_fusion and register_backward_fusion maintain global registries of fusion pattern matchers. Handles FP8 state management, quantized tensor serialization, CUDA graph compatibility, and recipe state updates.

Usage

The architectural centerpiece of the ops framework. All fusion optimizations flow through this manager, which automatically discovers and applies kernel fusions without requiring users to manually compose fused operations.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/pytorch/ops/fuser.py
Lines
1--563

Signature

class _OperationFuserAutogradFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, *args): ...
    @staticmethod
    def backward(ctx, grad_output): ...

class OperationFuser:
    def __init__(self, ops: Iterable[FusibleOperation]): ...
    def __call__(self, input, ...): ...

def register_forward_fusion(fusion_func: Callable) -> Callable: ...
def register_backward_fusion(fusion_func: Callable) -> Callable: ...

Import

from transformer_engine.pytorch.ops.fuser import (
    OperationFuser,
    register_forward_fusion,
    register_backward_fusion,
)

I/O Contract

Inputs

Name Type Required Description
ops Iterable[FusibleOperation] Yes Sequence of operations to fuse
input torch.Tensor Yes Input tensor for the fused pipeline

Outputs

Name Type Description
output torch.Tensor Result of executing the fused operation pipeline

Usage Examples

from transformer_engine.pytorch.ops.fuser import OperationFuser
from transformer_engine.pytorch.ops.basic import BasicLinear, Bias, GELU

# Create an operation fuser with a sequence of ops
linear = BasicLinear(1024, 4096)
bias = Bias(4096)
activation = GELU()

fuser = OperationFuser([linear, bias, activation])
output = fuser(input_tensor)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment