Implementation:NVIDIA TransformerEngine Ops Activation

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, PyTorch, Quantization
Last Updated	2026-02-07 14:00 GMT

Overview

Defines fusible activation function operations including element-wise (GELU, ReLU, SiLU) and gated linear unit variants (GEGLU, SwiGLU, ReGLU, etc.) for the ops pipeline framework.

Description

Provides an abstract base class _ActivationOperation that implements op_forward and op_backward with dtype checking, optional FP8 input caching, and CPU offloading. Each concrete subclass (GELU, SwiGLU, etc.) implements _activation_forward_impl and _activation_backward_impl that delegate to optimized C++/CUDA kernels via transformer_engine_torch. GLU variants split the input tensor along the last dimension and apply gating. The "Q" prefixed variants (QGELU, QGEGLU) and "S" prefixed variants (SReLU, SReGLU) represent quantized and smoothed versions respectively. ClampedSwiGLU adds output clamping to the SwiGLU variant.

Usage

Use as building blocks in fusible operation pipelines. These operations automatically fuse with adjacent operations (e.g., bias + activation) when composed through the OperationFuser.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/pytorch/ops/basic/activation.py
Lines: 1--430

Signature

class _ActivationOperation(BasicOperation, metaclass=abc.ABCMeta):
    def op_forward(self, ctx, input, ...): ...
    def op_backward(self, ctx, grad_output): ...

class GELU(_ActivationOperation): ...
class GEGLU(_ActivationOperation): ...
class QGELU(_ActivationOperation): ...
class QGEGLU(_ActivationOperation): ...
class ReLU(_ActivationOperation): ...
class ReGLU(_ActivationOperation): ...
class SReLU(_ActivationOperation): ...
class SReGLU(_ActivationOperation): ...
class SiLU(_ActivationOperation): ...
class SwiGLU(_ActivationOperation): ...
class ClampedSwiGLU(_ActivationOperation): ...

Import

from transformer_engine.pytorch.ops.basic import (
    GELU, GEGLU, SwiGLU, SiLU, ReLU,
    ReGLU, SReLU, ClampedSwiGLU,
)

I/O Contract

Inputs

Name	Type	Required	Description
input	`torch.Tensor`	Yes	Input tensor; for GLU variants, the last dim is split in half for gating

Outputs

Name	Type	Description
output	`torch.Tensor`	Activated tensor; for GLU variants, output dim is half the input dim

Usage Examples

from transformer_engine.pytorch.ops.basic import SwiGLU, GELU

# Create fusible activation ops
swiglu = SwiGLU()
gelu = GELU()

# Use in an operation pipeline (fuses with adjacent ops)
from transformer_engine.pytorch.ops import Sequential
mlp = Sequential(linear1, swiglu, linear2)
output = mlp(input_tensor)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment