Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vllm project Vllm CPU Activation

From Leeroopedia


Knowledge Sources
Domains Activation, CPU_Inference
Last Updated 2026-02-08 00:00 GMT

Overview

Implements vectorized CPU activation functions (SiLU, GELU variants) with OpenMP parallelization for efficient LLM inference on CPU backends.

Description

This file provides a templated activation_kernel that applies activation functions element-wise using FP32Vec8 SIMD vectorization. It supports both gated (e.g., SiLU-and-mul, GELU-and-mul) and non-gated (e.g., GELU-new, GELU-fast, GELU-quick) activation modes. Each activation variant is implemented as a standalone inline function operating on 8-wide FP32 vectors, dispatched through PyTorch's type dispatch macro.

Usage

These functions are compiled into the vLLM CPU extension library and called from the Python layer via PyTorch custom ops when running inference on CPU. They serve as the CPU backend implementations for activation layers in transformer models.

Code Reference

Source Location

Signature

// Gated activation functions (input [..., 2*d] -> output [..., d])
void silu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_and_mul(torch::Tensor& out, torch::Tensor& input);
void gelu_tanh_and_mul(torch::Tensor& out, torch::Tensor& input);

// Non-gated activation functions (input [..., d] -> output [..., d])
void gelu_new(torch::Tensor& out, torch::Tensor& input);
void gelu_fast(torch::Tensor& out, torch::Tensor& input);
void gelu_quick(torch::Tensor& out, torch::Tensor& input);

Import

#include "cpu_types.hpp"

I/O Contract

Inputs

Name Type Required Description
input torch::Tensor Yes Input tensor; shape [..., 2*d] for gated activations or [..., d] for non-gated activations
out torch::Tensor Yes Pre-allocated output tensor; shape [..., d]

Outputs

Name Type Description
out torch::Tensor Activation result written in-place to the provided output tensor [..., d]

Usage Examples

// SiLU-and-mul gated activation for MLP layers
torch::Tensor input = torch::randn({num_tokens, 2 * hidden_dim});
torch::Tensor output = torch::empty({num_tokens, hidden_dim});
silu_and_mul(output, input);

// Non-gated GELU activation
torch::Tensor x = torch::randn({num_tokens, hidden_dim});
torch::Tensor y = torch::empty({num_tokens, hidden_dim});
gelu_new(y, x);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment