Principle:LLMBook zh LLMBook zh github io LLaMA Decoder Layer

Knowledge Sources	LLaMA: Open and Efficient Foundation Language Models LLMBook-zh
Domains	Deep_Learning, Model_Architecture
Last Updated	2026-02-08 04:29 GMT

Overview

Single Transformer decoder block implementing Pre-Norm self-attention and feed-forward computation with RMSNorm and residual connections.

Description

The LLaMA Decoder Layer is the fundamental repeating unit in the LLaMA architecture. Each layer applies two sub-computations in sequence: (1) self-attention preceded by RMSNorm (Pre-Norm) with a residual connection, and (2) a feed-forward network (MLP) preceded by RMSNorm with a residual connection. The Pre-Norm design (normalizing before each sub-layer rather than after) improves training stability for deep networks. The LLaMA decoder layer uses RMSNorm instead of LayerNorm, and the MLP uses SwiGLU activation. Multiple decoder layers are stacked to form the complete LLaMA model.

Usage

Use this principle when understanding the internal structure of each Transformer block in LLaMA-family models. The decoder layer is the core building block that is repeated $N$ times (e.g., 32 layers for LLaMA-7B, 80 layers for LLaMA-70B). Understanding this layer is essential for grasping how attention and feed-forward computation interact with normalization and residual connections.

Theoretical Basis

Each decoder layer computes:

$h^{'} = h + SelfAttn (RMSNorm (h))$ $h_{out} = h^{'} + MLP (RMSNorm (h^{'}))$

This is the Pre-Norm Transformer pattern where normalization is applied before each sub-layer rather than after.

Pseudo-code Logic:

# Abstract algorithm description (NOT real implementation)
# Sub-layer 1: Pre-Norm Self-Attention + Residual
residual = hidden_states
hidden_states = rms_norm_1(hidden_states)
hidden_states = self_attention(hidden_states, mask, position_ids)
hidden_states = residual + hidden_states

# Sub-layer 2: Pre-Norm FFN + Residual
residual = hidden_states
hidden_states = rms_norm_2(hidden_states)
hidden_states = mlp(hidden_states)
hidden_states = residual + hidden_states

Related Pages

Implementation:LLMBook_zh_LLMBook_zh_github_io_LlamaDecoderLayer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment