Principle:LLMBook zh LLMBook zh github io RMS Normalization

Knowledge Sources	Root Mean Square Layer Normalization LLMBook-zh
Domains	Deep_Learning, Model_Architecture
Last Updated	2026-02-08 04:29 GMT

Overview

Normalization technique that stabilizes hidden states by dividing by their root mean square, omitting the mean-centering step of standard Layer Normalization.

Description

RMS Normalization (RMSNorm) is a simplification of Layer Normalization that removes the mean-centering operation. Instead of computing both mean and variance, RMSNorm only computes the root mean square of the hidden states and rescales. This reduces computational cost while maintaining comparable performance. RMSNorm is the normalization method used in LLaMA and other modern LLM architectures, replacing the standard LayerNorm. The key advantage is reduced overhead: by skipping the mean subtraction step, the normalization becomes simpler and faster without sacrificing model quality.

Usage

Use this principle when building or understanding Transformer decoder architectures that follow the LLaMA design pattern. RMSNorm is applied before self-attention and before the feed-forward network in each decoder layer (Pre-Norm architecture). It is the standard normalization choice for modern large language models including LLaMA, Mistral, and Qwen.

Theoretical Basis

The RMSNorm operation is defined as:

$RMSNorm (x) = \frac{x}{\sqrt{\frac{1}{d} \sum_{i = 1}^{d} x_{i}^{2} + ϵ}} \cdot γ$

Where:

$x$ is the input hidden state vector of dimension $d$
$ϵ$ is a small constant for numerical stability (default $1 0^{- 6}$ )
$γ$ is a learnable scale parameter (initialized to ones)

Pseudo-code Logic:

# Abstract algorithm description (NOT real implementation)
variance = mean(x ** 2, dim=-1)
x_normalized = x * rsqrt(variance + eps)
output = weight * x_normalized

Related Pages

Implementation:LLMBook_zh_LLMBook_zh_github_io_LlamaRMSNorm

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment