Implementation:LLMBook zh LLMBook zh github io LlamaForCausalLM Forward
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for computing causal language modeling loss with next-token prediction provided by the LLMBook repository.
Description
The LlamaForCausalLM class extends LlamaPreTrainedModel to add a linear language model head ('lm_head) that projects hidden states to vocabulary-sized logits. Its forward method computes the shifted cross-entropy loss when labels are provided: it predicts token t+1 from position ts logits.
Usage
This class is used as the model in pre-training. When passed to HuggingFace Trainer with labels, the loss is automatically computed.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/6.1 LM损失.py
- Lines: 1-43
Signature
class LlamaForCausalLM(LlamaPreTrainedModel):
def __init__(self, config):
"""
Args:
config: LlamaConfig with hidden_size, vocab_size, etc.
Attributes:
model: LlamaModel instance
lm_head: nn.Linear(hidden_size, vocab_size, bias=False)
"""
def forward(
self,
input_ids: torch.LongTensor = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
labels: Optional[torch.LongTensor] = None,
**kwargs,
) -> Union[Tuple, CausalLMOutputWithPast]:
"""
Args:
input_ids: Token IDs [batch_size, seq_length].
attention_mask: Attention mask.
position_ids: Position IDs.
labels: Target token IDs for loss computation (shifted internally).
Returns:
CausalLMOutputWithPast(loss=Tensor, logits=Tensor).
"""
Import
from transformers import LlamaForCausalLM
# Or from the local implementation:
from lm_loss import LlamaForCausalLM
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_ids | LongTensor | Yes | Token IDs [batch_size, seq_length] |
| attention_mask | Tensor | No | Attention mask |
| position_ids | LongTensor | No | Position IDs |
| labels | LongTensor | No | Target token IDs (when provided, loss is computed) |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | Tensor | Cross-entropy loss (only when labels provided) |
| logits | Tensor | Per-token vocabulary logits [batch_size, seq_length, vocab_size] |
Usage Examples
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hello, world!", return_tensors="pt")
inputs["labels"] = inputs["input_ids"].clone()
outputs = model(**inputs)
print(f"Loss: {outputs.loss.item()}")
print(f"Logits shape: {outputs.logits.shape}")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment