Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LLMBook zh LLMBook zh github io LlamaModel

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Model_Architecture, NLP
Last Updated 2026-02-08 04:29 GMT

Overview

Concrete tool for the complete LLaMA model forward pass provided by HuggingFace Transformers as a PreTrainedModel subclass.

Description

The LlamaModel class implements the full LLaMA decoder-only Transformer architecture. It contains the token embedding layer (`embed_tokens`), a stack of `LlamaDecoderLayer` modules, and a final `LlamaRMSNorm` normalization layer. The forward pass converts input IDs to embeddings, constructs a causal attention mask, passes the hidden states through all decoder layers sequentially, and applies final normalization. This class serves as the backbone model — the causal LM head (`LlamaForCausalLM`) wraps this model and adds the output projection for next-token prediction.

Usage

Import this class when studying the LLaMA architecture internals or when you need the hidden state representations without the language modeling head. In practice, most users interact with `LlamaForCausalLM` which wraps `LlamaModel` with a linear head for token prediction.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/5.5 LLaMA.py
  • Lines: 1-45

Signature

class LlamaModel(LlamaPreTrainedModel):
    def __init__(self, config: LlamaConfig):
        """
        Args:
            config: LlamaConfig with vocab_size, hidden_size, num_hidden_layers,
                    rms_norm_eps, max_position_embeddings, etc.
        """

    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        **kwargs,
    ) -> Union[Tuple, BaseModelOutputWithPast]:
        """
        Args:
            input_ids: Token IDs of shape (batch_size, seq_length).
            attention_mask: Attention mask of shape (batch_size, seq_length).
            position_ids: Position indices of shape (batch_size, seq_length).
        Returns:
            BaseModelOutputWithPast with last_hidden_state of shape
            (batch_size, seq_length, hidden_size).
        """

Import

from transformers import LlamaModel, LlamaConfig
# Or defined locally in code/5.5 LLaMA.py

I/O Contract

Inputs

Name Type Required Description
config LlamaConfig Yes Model configuration (constructor)
input_ids torch.LongTensor Yes Token IDs (batch_size, seq_length)
attention_mask torch.Tensor No Attention mask (batch_size, seq_length)
position_ids torch.LongTensor No Position indices (batch_size, seq_length)

Outputs

Name Type Description
last_hidden_state torch.Tensor Hidden states after all layers and final norm (batch, seq, hidden)

Usage Examples

from transformers import LlamaModel, LlamaConfig
import torch

# Initialize model from config
config = LlamaConfig(
    vocab_size=32000,
    hidden_size=4096,
    num_hidden_layers=32,
    num_attention_heads=32,
)
model = LlamaModel(config)

# Forward pass
input_ids = torch.randint(0, 32000, (1, 128))
outputs = model(input_ids=input_ids)
hidden_states = outputs.last_hidden_state
# hidden_states.shape == (1, 128, 4096)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment