Implementation:NVIDIA TransformerEngine Ops Activation
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, PyTorch, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Defines fusible activation function operations including element-wise (GELU, ReLU, SiLU) and gated linear unit variants (GEGLU, SwiGLU, ReGLU, etc.) for the ops pipeline framework.
Description
Provides an abstract base class _ActivationOperation that implements op_forward and op_backward with dtype checking, optional FP8 input caching, and CPU offloading. Each concrete subclass (GELU, SwiGLU, etc.) implements _activation_forward_impl and _activation_backward_impl that delegate to optimized C++/CUDA kernels via transformer_engine_torch. GLU variants split the input tensor along the last dimension and apply gating. The "Q" prefixed variants (QGELU, QGEGLU) and "S" prefixed variants (SReLU, SReGLU) represent quantized and smoothed versions respectively. ClampedSwiGLU adds output clamping to the SwiGLU variant.
Usage
Use as building blocks in fusible operation pipelines. These operations automatically fuse with adjacent operations (e.g., bias + activation) when composed through the OperationFuser.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/pytorch/ops/basic/activation.py- Lines
- 1--430
Signature
class _ActivationOperation(BasicOperation, metaclass=abc.ABCMeta):
def op_forward(self, ctx, input, ...): ...
def op_backward(self, ctx, grad_output): ...
class GELU(_ActivationOperation): ...
class GEGLU(_ActivationOperation): ...
class QGELU(_ActivationOperation): ...
class QGEGLU(_ActivationOperation): ...
class ReLU(_ActivationOperation): ...
class ReGLU(_ActivationOperation): ...
class SReLU(_ActivationOperation): ...
class SReGLU(_ActivationOperation): ...
class SiLU(_ActivationOperation): ...
class SwiGLU(_ActivationOperation): ...
class ClampedSwiGLU(_ActivationOperation): ...
Import
from transformer_engine.pytorch.ops.basic import (
GELU, GEGLU, SwiGLU, SiLU, ReLU,
ReGLU, SReLU, ClampedSwiGLU,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | torch.Tensor |
Yes | Input tensor; for GLU variants, the last dim is split in half for gating |
Outputs
| Name | Type | Description |
|---|---|---|
| output | torch.Tensor |
Activated tensor; for GLU variants, output dim is half the input dim |
Usage Examples
from transformer_engine.pytorch.ops.basic import SwiGLU, GELU
# Create fusible activation ops
swiglu = SwiGLU()
gelu = GELU()
# Use in an operation pipeline (fuses with adjacent ops)
from transformer_engine.pytorch.ops import Sequential
mlp = Sequential(linear1, swiglu, linear2)
output = mlp(input_tensor)