Implementation:LLMBook zh LLMBook zh github io LlamaForCausalLM Forward

Knowledge Sources	LLMBook-zh
Domains	NLP, Deep_Learning
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for computing causal language modeling loss with next-token prediction provided by the LLMBook repository.

Description

The LlamaForCausalLM class extends LlamaPreTrainedModel to add a linear language model head ('lm_head) that projects hidden states to vocabulary-sized logits. Its forward method computes the shifted cross-entropy loss when labels are provided: it predicts token t+1 from position ts logits.

Usage

This class is used as the model in pre-training. When passed to HuggingFace Trainer with labels, the loss is automatically computed.

Code Reference

Source Location

Repository: LLMBook-zh
File: code/6.1 LM损失.py
Lines: 1-43

Signature

class LlamaForCausalLM(LlamaPreTrainedModel):
    def __init__(self, config):
        """
        Args:
            config: LlamaConfig with hidden_size, vocab_size, etc.
        Attributes:
            model: LlamaModel instance
            lm_head: nn.Linear(hidden_size, vocab_size, bias=False)
        """

    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        """
        Args:
            input_ids: Token IDs [batch_size, seq_length].
            attention_mask: Attention mask.
            position_ids: Position IDs.
            labels: Target token IDs for loss computation (shifted internally).

        Returns:
            CausalLMOutputWithPast(loss=Tensor, logits=Tensor).
        """

Import

from transformers import LlamaForCausalLM
# Or from the local implementation:
from lm_loss import LlamaForCausalLM

I/O Contract

Inputs

Name	Type	Required	Description
input_ids	LongTensor	Yes	Token IDs [batch_size, seq_length]
attention_mask	Tensor	No	Attention mask
position_ids	LongTensor	No	Position IDs
labels	LongTensor	No	Target token IDs (when provided, loss is computed)

Outputs

Name	Type	Description
loss	Tensor	Cross-entropy loss (only when labels provided)
logits	Tensor	Per-token vocabulary logits [batch_size, seq_length, vocab_size]

Usage Examples

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

inputs = tokenizer("Hello, world!", return_tensors="pt")
inputs["labels"] = inputs["input_ids"].clone()

outputs = model(**inputs)
print(f"Loss: {outputs.loss.item()}")
print(f"Logits shape: {outputs.logits.shape}")

Related Pages

Implements Principle

Principle:LLMBook_zh_LLMBook_zh_github_io_Causal_LM_Loss_Computation

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment