Implementation:Vllm project Vllm CPU Activation

Knowledge Sources	vllm
Domains	Activation, CPU_Inference
Last Updated	2026-02-08 00:00 GMT

Overview

Implements vectorized CPU activation functions (SiLU, GELU variants) with OpenMP parallelization for efficient LLM inference on CPU backends.

Description

This file provides a templated activation_kernel that applies activation functions element-wise using FP32Vec8 SIMD vectorization. It supports both gated (e.g., SiLU-and-mul, GELU-and-mul) and non-gated (e.g., GELU-new, GELU-fast, GELU-quick) activation modes. Each activation variant is implemented as a standalone inline function operating on 8-wide FP32 vectors, dispatched through PyTorch's type dispatch macro.

Usage

These functions are compiled into the vLLM CPU extension library and called from the Python layer via PyTorch custom ops when running inference on CPU. They serve as the CPU backend implementations for activation layers in transformer models.

Code Reference

Source Location

Repository: vllm
File: csrc/cpu/activation.cpp
Lines: 1-163

Signature

// Gated activation functions (input [..., 2*d] -> output [..., d])
void silu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_tanh_and_mul(torch::Tensor& out, torch::Tensor& input);

// Non-gated activation functions (input [..., d] -> output [..., d])
void gelu_new(torch::Tensor& out, torch::Tensor& input);
void gelu_fast(torch::Tensor& out, torch::Tensor& input);
void gelu_quick(torch::Tensor& out, torch::Tensor& input);

Import

#include "cpu_types.hpp"

I/O Contract

Inputs

Name	Type	Required	Description
input	torch::Tensor	Yes	Input tensor; shape [..., 2*d] for gated activations or [..., d] for non-gated activations
out	torch::Tensor	Yes	Pre-allocated output tensor; shape [..., d]

Outputs

Name	Type	Description
out	torch::Tensor	Activation result written in-place to the provided output tensor [..., d]

Usage Examples

// SiLU-and-mul gated activation for MLP layers
torch::Tensor input = torch::randn({num_tokens, 2 * hidden_dim});
torch::Tensor output = torch::empty({num_tokens, hidden_dim});
silu_and_mul(output, input);

// Non-gated GELU activation
torch::Tensor x = torch::randn({num_tokens, hidden_dim});
torch::Tensor y = torch::empty({num_tokens, hidden_dim});
gelu_new(y, x);

Related Pages

Environment:Vllm_project_Vllm_CPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment