Implementation:Sgl project Sglang CPU Activation

Knowledge Sources	Sgl_project_Sglang
Domains	CPU Inference, Activation Functions
Last Updated	2026-02-10 00:00 GMT

Overview

CPU-optimized fused activation-and-multiply functions (SiLU, GELU-tanh, GELU) using SIMD vectorization for LLM inference.

Description

activation.cpp implements three fused gated activation functions that are fundamental building blocks of modern LLM architectures (such as LLaMA FFN layers). The core is a templated act_and_mul_kernel_impl function that splits an input tensor of shape [num_tokens, 2*d] into two halves, applies an activation function to the first half, multiplies it element-wise with the second half, and writes the result of shape [num_tokens, d].

The implementation uses ATen vectorized operations (at::vec::Vectorized) for SIMD acceleration on bfloat16/float16 data with float32 intermediate computation. It parallelizes across tokens via at::parallel_for and uses #pragma GCC unroll 4 for loop unrolling. A scalar fallback handles the tail elements that do not fill a complete SIMD vector.

Three public functions are exposed:

silu_and_mul_cpu: Implements SiLU (x * sigmoid(x)) gated multiplication
gelu_tanh_and_mul_cpu: Implements GELU with tanh approximation gated multiplication
gelu_and_mul_cpu: Implements standard GELU (using erf) gated multiplication

All three are dispatched for reduced floating-point types via AT_DISPATCH_REDUCED_FLOATING_TYPES.

Usage

Use these functions as drop-in replacements for GPU activation kernels when running LLM inference on CPU. They fuse the activation and multiplication into a single vectorized kernel, avoiding the overhead of separate PyTorch operations.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: sgl-kernel/csrc/cpu/activation.cpp
Lines: 1-135

Signature

// Internal template (anonymous namespace)
template <typename scalar_t, typename func_t, typename vec_func_t>
void act_and_mul_kernel_impl(
    scalar_t* __restrict__ output,
    const scalar_t* __restrict__ input,
    int64_t num_tokens,
    int64_t dim,
    const func_t& f,
    const vec_func_t& vf);

// Public API
at::Tensor silu_and_mul_cpu(at::Tensor& input);
at::Tensor gelu_tanh_and_mul_cpu(const at::Tensor& input);
at::Tensor gelu_and_mul_cpu(const at::Tensor& input);

Import

#include "common.h"
#include "vec.h"

I/O Contract

Inputs

Name	Type	Required	Description
input	at::Tensor	Yes	Input tensor of shape [num_tokens, 2*d] with bfloat16 or float16 dtype

Outputs

Name	Type	Description
output	at::Tensor	Result tensor of shape [num_tokens, d] with same dtype as input

Usage Examples

// Called from PyTorch C++ extension:
// SiLU gated activation
at::Tensor input = /* shape [batch, 2 * hidden_dim] */;
at::Tensor output = silu_and_mul_cpu(input);
// output shape: [batch, hidden_dim]

// GELU-tanh gated activation
at::Tensor output_gelu = gelu_tanh_and_mul_cpu(input);

// Standard GELU gated activation
at::Tensor output_gelu_std = gelu_and_mul_cpu(input);

Related Pages

Environment:Sgl_project_Sglang_CPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment