Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LLMBook zh LLMBook zh github io LlamaForCausalLM Forward

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for computing causal language modeling loss with next-token prediction provided by the LLMBook repository.

Description

The LlamaForCausalLM class extends LlamaPreTrainedModel to add a linear language model head ('lm_head) that projects hidden states to vocabulary-sized logits. Its forward method computes the shifted cross-entropy loss when labels are provided: it predicts token t+1 from position ts logits.

Usage

This class is used as the model in pre-training. When passed to HuggingFace Trainer with labels, the loss is automatically computed.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/6.1 LM损失.py
  • Lines: 1-43

Signature

class LlamaForCausalLM(LlamaPreTrainedModel):
    def __init__(self, config):
        """
        Args:
            config: LlamaConfig with hidden_size, vocab_size, etc.
        Attributes:
            model: LlamaModel instance
            lm_head: nn.Linear(hidden_size, vocab_size, bias=False)
        """

    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        labels: Optional[torch.LongTensor] = None,
        **kwargs,
    ) -> Union[Tuple, CausalLMOutputWithPast]:
        """
        Args:
            input_ids: Token IDs [batch_size, seq_length].
            attention_mask: Attention mask.
            position_ids: Position IDs.
            labels: Target token IDs for loss computation (shifted internally).

        Returns:
            CausalLMOutputWithPast(loss=Tensor, logits=Tensor).
        """

Import

from transformers import LlamaForCausalLM
# Or from the local implementation:
from lm_loss import LlamaForCausalLM

I/O Contract

Inputs

Name Type Required Description
input_ids LongTensor Yes Token IDs [batch_size, seq_length]
attention_mask Tensor No Attention mask
position_ids LongTensor No Position IDs
labels LongTensor No Target token IDs (when provided, loss is computed)

Outputs

Name Type Description
loss Tensor Cross-entropy loss (only when labels provided)
logits Tensor Per-token vocabulary logits [batch_size, seq_length, vocab_size]

Usage Examples

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

inputs = tokenizer("Hello, world!", return_tensors="pt")
inputs["labels"] = inputs["input_ids"].clone()

outputs = model(**inputs)
print(f"Loss: {outputs.loss.item()}")
print(f"Logits shape: {outputs.logits.shape}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment