Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LLMBook zh LLMBook zh github io Causal LM Loss Computation

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning, Optimization
Last Updated 2026-02-08 00:00 GMT

Overview

The next-token prediction objective that trains causal language models by computing cross-entropy loss between predicted and actual token distributions.

Description

Causal LM Loss Computation implements the standard language modeling objective: predict each token given all preceding tokens. The model produces logits (unnormalized probabilities) for each position, and the loss is the cross-entropy between these logits and the actual next tokens. Internally, the logits are shifted so that position t predicts token t+1.

This is the fundamental training signal for all autoregressive language models including GPT, LLaMA, and other decoder-only architectures.

Usage

Use this principle whenever training an autoregressive language model. The loss computation is embedded in the model's forward pass — when labels are provided, the model automatically computes and returns the loss.

Theoretical Basis

Given a sequence of tokens x1,x2,,xT, the causal language modeling loss is:

=1T1t=1T1logP(xt+1|x1,,xt)

Implementation steps:

  1. Pass input through the Transformer to get hidden states.
  2. Project hidden states to vocabulary logits via a linear head.
  3. Shift logits and labels: logits[:-1] predicts labels[1:].
  4. Compute cross-entropy loss on the flattened tensors.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment