Implementation:LLMBook zh LLMBook zh github io LlamaModel

Knowledge Sources	LLMBook-zh LLaMA: Open and Efficient Foundation Language Models
Domains	Deep_Learning, Model_Architecture, NLP
Last Updated	2026-02-08 04:29 GMT

Overview

Concrete tool for the complete LLaMA model forward pass provided by HuggingFace Transformers as a PreTrainedModel subclass.

Description

The LlamaModel class implements the full LLaMA decoder-only Transformer architecture. It contains the token embedding layer (`embed_tokens`), a stack of `LlamaDecoderLayer` modules, and a final `LlamaRMSNorm` normalization layer. The forward pass converts input IDs to embeddings, constructs a causal attention mask, passes the hidden states through all decoder layers sequentially, and applies final normalization. This class serves as the backbone model — the causal LM head (`LlamaForCausalLM`) wraps this model and adds the output projection for next-token prediction.

Usage

Import this class when studying the LLaMA architecture internals or when you need the hidden state representations without the language modeling head. In practice, most users interact with `LlamaForCausalLM` which wraps `LlamaModel` with a linear head for token prediction.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/5.5 LLaMA.py
Lines: 1-45

Signature

class LlamaModel(LlamaPreTrainedModel):
    def __init__(self, config: LlamaConfig):
        """
        Args:
            config: LlamaConfig with vocab_size, hidden_size, num_hidden_layers,
                    rms_norm_eps, max_position_embeddings, etc.
        """

    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        **kwargs,
    ) -> Union[Tuple, BaseModelOutputWithPast]:
        """
        Args:
            input_ids: Token IDs of shape (batch_size, seq_length).
            attention_mask: Attention mask of shape (batch_size, seq_length).
            position_ids: Position indices of shape (batch_size, seq_length).
        Returns:
            BaseModelOutputWithPast with last_hidden_state of shape
            (batch_size, seq_length, hidden_size).
        """

Import

from transformers import LlamaModel, LlamaConfig
# Or defined locally in code/5.5 LLaMA.py

I/O Contract

Inputs

Name	Type	Required	Description
config	LlamaConfig	Yes	Model configuration (constructor)
input_ids	torch.LongTensor	Yes	Token IDs (batch_size, seq_length)
attention_mask	torch.Tensor	No	Attention mask (batch_size, seq_length)
position_ids	torch.LongTensor	No	Position indices (batch_size, seq_length)

Outputs

Name	Type	Description
last_hidden_state	torch.Tensor	Hidden states after all layers and final norm (batch, seq, hidden)

Usage Examples

from transformers import LlamaModel, LlamaConfig
import torch

# Initialize model from config
config = LlamaConfig(
    vocab_size=32000,
    hidden_size=4096,
    num_hidden_layers=32,
    num_attention_heads=32,
)
model = LlamaModel(config)

# Forward pass
input_ids = torch.randint(0, 32000, (1, 128))
outputs = model(input_ids=input_ids)
hidden_states = outputs.last_hidden_state
# hidden_states.shape == (1, 128, 4096)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment