Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Ops Activation

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, PyTorch, Quantization
Last Updated 2026-02-07 14:00 GMT

Overview

Defines fusible activation function operations including element-wise (GELU, ReLU, SiLU) and gated linear unit variants (GEGLU, SwiGLU, ReGLU, etc.) for the ops pipeline framework.

Description

Provides an abstract base class _ActivationOperation that implements op_forward and op_backward with dtype checking, optional FP8 input caching, and CPU offloading. Each concrete subclass (GELU, SwiGLU, etc.) implements _activation_forward_impl and _activation_backward_impl that delegate to optimized C++/CUDA kernels via transformer_engine_torch. GLU variants split the input tensor along the last dimension and apply gating. The "Q" prefixed variants (QGELU, QGEGLU) and "S" prefixed variants (SReLU, SReGLU) represent quantized and smoothed versions respectively. ClampedSwiGLU adds output clamping to the SwiGLU variant.

Usage

Use as building blocks in fusible operation pipelines. These operations automatically fuse with adjacent operations (e.g., bias + activation) when composed through the OperationFuser.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/pytorch/ops/basic/activation.py
Lines
1--430

Signature

class _ActivationOperation(BasicOperation, metaclass=abc.ABCMeta):
    def op_forward(self, ctx, input, ...): ...
    def op_backward(self, ctx, grad_output): ...

class GELU(_ActivationOperation): ...
class GEGLU(_ActivationOperation): ...
class QGELU(_ActivationOperation): ...
class QGEGLU(_ActivationOperation): ...
class ReLU(_ActivationOperation): ...
class ReGLU(_ActivationOperation): ...
class SReLU(_ActivationOperation): ...
class SReGLU(_ActivationOperation): ...
class SiLU(_ActivationOperation): ...
class SwiGLU(_ActivationOperation): ...
class ClampedSwiGLU(_ActivationOperation): ...

Import

from transformer_engine.pytorch.ops.basic import (
    GELU, GEGLU, SwiGLU, SiLU, ReLU,
    ReGLU, SReLU, ClampedSwiGLU,
)

I/O Contract

Inputs

Name Type Required Description
input torch.Tensor Yes Input tensor; for GLU variants, the last dim is split in half for gating

Outputs

Name Type Description
output torch.Tensor Activated tensor; for GLU variants, output dim is half the input dim

Usage Examples

from transformer_engine.pytorch.ops.basic import SwiGLU, GELU

# Create fusible activation ops
swiglu = SwiGLU()
gelu = GELU()

# Use in an operation pipeline (fuses with adjacent ops)
from transformer_engine.pytorch.ops import Sequential
mlp = Sequential(linear1, swiglu, linear2)
output = mlp(input_tensor)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment