Implementation:LLMBook zh LLMBook zh github io LlamaModel
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Architecture, NLP |
| Last Updated | 2026-02-08 04:29 GMT |
Overview
Concrete tool for the complete LLaMA model forward pass provided by HuggingFace Transformers as a PreTrainedModel subclass.
Description
The LlamaModel class implements the full LLaMA decoder-only Transformer architecture. It contains the token embedding layer (`embed_tokens`), a stack of `LlamaDecoderLayer` modules, and a final `LlamaRMSNorm` normalization layer. The forward pass converts input IDs to embeddings, constructs a causal attention mask, passes the hidden states through all decoder layers sequentially, and applies final normalization. This class serves as the backbone model — the causal LM head (`LlamaForCausalLM`) wraps this model and adds the output projection for next-token prediction.
Usage
Import this class when studying the LLaMA architecture internals or when you need the hidden state representations without the language modeling head. In practice, most users interact with `LlamaForCausalLM` which wraps `LlamaModel` with a linear head for token prediction.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/5.5 LLaMA.py
- Lines: 1-45
Signature
class LlamaModel(LlamaPreTrainedModel):
def __init__(self, config: LlamaConfig):
"""
Args:
config: LlamaConfig with vocab_size, hidden_size, num_hidden_layers,
rms_norm_eps, max_position_embeddings, etc.
"""
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
**kwargs,
) -> Union[Tuple, BaseModelOutputWithPast]:
"""
Args:
input_ids: Token IDs of shape (batch_size, seq_length).
attention_mask: Attention mask of shape (batch_size, seq_length).
position_ids: Position indices of shape (batch_size, seq_length).
Returns:
BaseModelOutputWithPast with last_hidden_state of shape
(batch_size, seq_length, hidden_size).
"""
Import
from transformers import LlamaModel, LlamaConfig
# Or defined locally in code/5.5 LLaMA.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | LlamaConfig | Yes | Model configuration (constructor) |
| input_ids | torch.LongTensor | Yes | Token IDs (batch_size, seq_length) |
| attention_mask | torch.Tensor | No | Attention mask (batch_size, seq_length) |
| position_ids | torch.LongTensor | No | Position indices (batch_size, seq_length) |
Outputs
| Name | Type | Description |
|---|---|---|
| last_hidden_state | torch.Tensor | Hidden states after all layers and final norm (batch, seq, hidden) |
Usage Examples
from transformers import LlamaModel, LlamaConfig
import torch
# Initialize model from config
config = LlamaConfig(
vocab_size=32000,
hidden_size=4096,
num_hidden_layers=32,
num_attention_heads=32,
)
model = LlamaModel(config)
# Forward pass
input_ids = torch.randint(0, 32000, (1, 128))
outputs = model(input_ids=input_ids)
hidden_states = outputs.last_hidden_state
# hidden_states.shape == (1, 128, 4096)