Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vllm project Vllm CPU Layernorm

From Leeroopedia


Knowledge Sources
Domains Normalization, CPU_Inference
Last Updated 2026-02-08 00:00 GMT

Overview

Implements vectorized RMS layer normalization and fused add-RMS normalization for CPU-based transformer inference using SIMD and OpenMP parallelization.

Description

This file provides two core normalization operations: rms_norm computes Root Mean Square normalization over the hidden dimension, and fused_add_rms_norm combines a residual addition with RMS normalization in a single pass to reduce memory traffic. Both implementations use FP32Vec8 vectorization for SIMD-accelerated variance computation and normalized output generation, with OpenMP parallelization across tokens.

Usage

These functions are compiled into the vLLM CPU extension and called from the Python layer as CPU backend implementations for RMSNorm layers. They are essential for LLaMA-family and other modern transformer models that use RMS normalization instead of LayerNorm.

Code Reference

Source Location

Signature

void rms_norm(torch::Tensor& out, torch::Tensor& input,
              torch::Tensor& weight, double epsilon);

void fused_add_rms_norm(torch::Tensor& input, torch::Tensor& residual,
                        torch::Tensor& weight, double epsilon);

Import

#include "cpu_types.hpp"

I/O Contract

Inputs

Name Type Required Description
input torch::Tensor Yes Input tensor [..., hidden_size] to be normalized
weight torch::Tensor Yes Learnable scale parameters [hidden_size]
epsilon double Yes Small constant for numerical stability in variance computation
residual torch::Tensor Yes (fused only) Residual tensor [..., hidden_size] for fused add+norm variant
out torch::Tensor Yes (rms_norm only) Pre-allocated output tensor [..., hidden_size]

Outputs

Name Type Description
out torch::Tensor Normalized output written in-place (rms_norm)
input torch::Tensor Normalized result written in-place (fused_add_rms_norm)
residual torch::Tensor Updated residual (input + residual) written in-place (fused_add_rms_norm)

Usage Examples

// Standard RMS normalization
torch::Tensor input = torch::randn({num_tokens, hidden_size});
torch::Tensor weight = torch::ones({hidden_size});
torch::Tensor output = torch::empty_like(input);
rms_norm(output, input, weight, 1e-6);

// Fused add + RMS normalization (saves memory bandwidth)
torch::Tensor hidden_states = torch::randn({num_tokens, hidden_size});
torch::Tensor residual = torch::randn({num_tokens, hidden_size});
torch::Tensor norm_weight = torch::ones({hidden_size});
fused_add_rms_norm(hidden_states, residual, norm_weight, 1e-6);
// After call: residual = old_hidden_states + old_residual
//             hidden_states = RMSNorm(residual) * weight

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment