Principle:LLMBook zh LLMBook zh github io LLaMA Model Architecture

Knowledge Sources	LLaMA: Open and Efficient Foundation Language Models LLMBook-zh
Domains	Deep_Learning, Model_Architecture, NLP
Last Updated	2026-02-08 04:29 GMT

Overview

Decoder-only Transformer architecture combining token embeddings, stacked decoder layers with Pre-Norm RMSNorm, and a final normalization layer to produce contextual hidden representations.

Description

The LLaMA Model Architecture defines the full forward pass of a decoder-only Transformer. It consists of three main components: (1) a token embedding layer that converts input IDs to dense vectors, (2) a stack of $N$ identical decoder layers (each containing self-attention and feed-forward sub-layers with RMSNorm and residual connections), and (3) a final RMSNorm applied to the output hidden states. The architecture uses a causal attention mask to ensure autoregressive generation. LLaMA introduced several design choices that became standard: Pre-Norm (applying normalization before each sub-layer rather than after), RMSNorm instead of LayerNorm, RoPE for position encoding, and SwiGLU activations in the FFN.

Usage

Use this principle when understanding the overall structure of LLaMA-family models and how the individual components (RMSNorm, RoPE, decoder layers) compose into a complete model. This is the top-level architecture that orchestrates embedding, sequential layer processing, and final normalization.

Theoretical Basis

The LLaMA forward pass is:

Failed to parse (syntax error): {\displaystyle h_0 = \text{Embed}(\text{input\_ids}) } Failed to parse (syntax error): {\displaystyle h_l = \text{DecoderLayer}_l(h_{l-1}, \text{causal\_mask}, \text{position\_ids}) \quad \text{for } l = 1, \ldots, N } $output = RMSNorm (h_{N})$

Where each DecoderLayer applies Pre-Norm attention and Pre-Norm FFN with residual connections.

Pseudo-code Logic:

# Abstract algorithm description (NOT real implementation)
hidden = embed_tokens(input_ids)
causal_mask = build_causal_mask(seq_len)
for layer in decoder_layers:
    hidden = layer(hidden, causal_mask, position_ids)
output = rms_norm(hidden)

Related Pages

Implementation:LLMBook_zh_LLMBook_zh_github_io_LlamaModel

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment