Implementation:Vllm project Vllm CPU Activation
| Knowledge Sources | |
|---|---|
| Domains | Activation, CPU_Inference |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Implements vectorized CPU activation functions (SiLU, GELU variants) with OpenMP parallelization for efficient LLM inference on CPU backends.
Description
This file provides a templated activation_kernel that applies activation functions element-wise using FP32Vec8 SIMD vectorization. It supports both gated (e.g., SiLU-and-mul, GELU-and-mul) and non-gated (e.g., GELU-new, GELU-fast, GELU-quick) activation modes. Each activation variant is implemented as a standalone inline function operating on 8-wide FP32 vectors, dispatched through PyTorch's type dispatch macro.
Usage
These functions are compiled into the vLLM CPU extension library and called from the Python layer via PyTorch custom ops when running inference on CPU. They serve as the CPU backend implementations for activation layers in transformer models.
Code Reference
Source Location
- Repository: vllm
- File: csrc/cpu/activation.cpp
- Lines: 1-163
Signature
// Gated activation functions (input [..., 2*d] -> output [..., d])
void silu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_tanh_and_mul(torch::Tensor& out, torch::Tensor& input);
// Non-gated activation functions (input [..., d] -> output [..., d])
void gelu_new(torch::Tensor& out, torch::Tensor& input);
void gelu_fast(torch::Tensor& out, torch::Tensor& input);
void gelu_quick(torch::Tensor& out, torch::Tensor& input);
Import
#include "cpu_types.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input | torch::Tensor | Yes | Input tensor; shape [..., 2*d] for gated activations or [..., d] for non-gated activations |
| out | torch::Tensor | Yes | Pre-allocated output tensor; shape [..., d] |
Outputs
| Name | Type | Description |
|---|---|---|
| out | torch::Tensor | Activation result written in-place to the provided output tensor [..., d] |
Usage Examples
// SiLU-and-mul gated activation for MLP layers
torch::Tensor input = torch::randn({num_tokens, 2 * hidden_dim});
torch::Tensor output = torch::empty({num_tokens, hidden_dim});
silu_and_mul(output, input);
// Non-gated GELU activation
torch::Tensor x = torch::randn({num_tokens, hidden_dim});
torch::Tensor y = torch::empty({num_tokens, hidden_dim});
gelu_new(y, x);