Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Huggingface Alignment handbook Liger Kernel Memory

From Leeroopedia





Knowledge Sources
Domains Optimization, Deep_Learning
Last Updated 2026-02-07 00:00 GMT

Overview

Enable Liger Kernel for fused Triton operations that reduce GPU memory usage and improve training throughput on long sequences.

Description

Liger Kernel provides fused Triton kernels for common transformer operations (cross-entropy, RMS norm, SwiGLU, etc.) that are more memory-efficient than standard PyTorch implementations. The alignment-handbook uses Liger Kernel in all SmolLM3 recipes (mid-training, SFT, and DPO) to enable training with very long sequences (up to 65536 tokens) on limited GPU memory.

Usage

Apply this when training with long sequences (8k+ tokens) or when GPU memory is constrained. Particularly beneficial for large-scale SFT and DPO training with models like SmolLM3.

The Insight (Rule of Thumb)

  • Action: Set `use_liger_kernel: true` in the training config.
  • Value: Reduces peak memory usage, enabling longer sequences or larger effective batch sizes.
  • Trade-off: Requires `liger-kernel` >= 0.6.0 package. May have minimal overhead from kernel compilation on first run.

Reasoning

Standard PyTorch operations materialize intermediate tensors (e.g., full logit matrix for cross-entropy), which is the dominant memory cost for long sequences. Liger Kernel fuses these operations in Triton, computing results without materializing the full intermediate tensor. This is critical for the SmolLM3 SFT recipe which trains with max_length=65536.

SmolLM3 SFT config from `recipes/smollm3/sft/sft.yaml:225`:

use_liger_kernel: true

SmolLM3 mid-training config from `recipes/smollm3/sft/mid.yaml:61`:

use_liger_kernel: true

SmolLM3 APO-Zero config from `recipes/smollm3/dpo/apo.yaml:65`:

use_liger_kernel: true

Liger Kernel version requirement from `setup.py:55`:

    "liger-kernel>=0.6.0",

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment