Implementation:NVIDIA TransformerEngine Ops Fuser
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Core operation fusion manager that orchestrates a pipeline of fusible operations with automatic forward/backward fusion discovery and application.
Description
OperationFuser takes a sequence of BasicOperations and applies registered fusion functions to produce optimized forward and backward operation lists. _OperationFuserAutogradFunction implements a custom torch.autograd.Function that runs the fused forward pipeline and reconstructs the backward pipeline. The forward pass iterates over fused forward ops, passing quantizer information between adjacent operations. The backward pass reverses the operation order, applies backward fusions, and distributes gradients to parameters and extra outputs. register_forward_fusion and register_backward_fusion maintain global registries of fusion pattern matchers. Handles FP8 state management, quantized tensor serialization, CUDA graph compatibility, and recipe state updates.
Usage
The architectural centerpiece of the ops framework. All fusion optimizations flow through this manager, which automatically discovers and applies kernel fusions without requiring users to manually compose fused operations.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/pytorch/ops/fuser.py- Lines
- 1--563
Signature
class _OperationFuserAutogradFunction(torch.autograd.Function):
@staticmethod
def forward(ctx, *args): ...
@staticmethod
def backward(ctx, grad_output): ...
class OperationFuser:
def __init__(self, ops: Iterable[FusibleOperation]): ...
def __call__(self, input, ...): ...
def register_forward_fusion(fusion_func: Callable) -> Callable: ...
def register_backward_fusion(fusion_func: Callable) -> Callable: ...
Import
from transformer_engine.pytorch.ops.fuser import (
OperationFuser,
register_forward_fusion,
register_backward_fusion,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ops | Iterable[FusibleOperation] |
Yes | Sequence of operations to fuse |
| input | torch.Tensor |
Yes | Input tensor for the fused pipeline |
Outputs
| Name | Type | Description |
|---|---|---|
| output | torch.Tensor |
Result of executing the fused operation pipeline |
Usage Examples
from transformer_engine.pytorch.ops.fuser import OperationFuser
from transformer_engine.pytorch.ops.basic import BasicLinear, Bias, GELU
# Create an operation fuser with a sequence of ops
linear = BasicLinear(1024, 4096)
bias = Bias(4096)
activation = GELU()
fuser = OperationFuser([linear, bias, activation])
output = fuser(input_tensor)