Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Intel Ipex llm GaLore Gradient Projection

From Leeroopedia


Knowledge Sources
Domains Optimization, Memory_Efficient_Training
Last Updated 2026-02-09 04:00 GMT

Overview

Optimization technique that reduces training memory by projecting gradients into a low-rank subspace, enabling full-parameter learning with reduced optimizer state memory.

Description

GaLore (Gradient Low-Rank Projection) addresses the memory bottleneck of optimizer states (e.g., Adam's first and second moments) by projecting the gradient matrix into a low-rank subspace before applying the optimizer. Unlike LoRA which restricts the weight update to low-rank, GaLore projects only the gradient, allowing the effective weight update to be full-rank while storing optimizer states only for the projected dimensions. The projection basis is periodically updated to track the changing gradient distribution.

Usage

Use this principle when memory is the primary constraint and LoRA's restriction to low-rank weight updates is limiting. GaLore is complementary to quantization and can be combined with it for further memory savings. It is particularly effective for pre-training and full-parameter fine-tuning scenarios.

Theoretical Basis

Given gradient Gm×n and projection matrix Pn×r (where rn):

Gproj=GP

The optimizer operates on Gprojm×r, reducing state memory from O(mn) to O(mr).

Pseudo-code Logic:

# Abstract GaLore algorithm
P = compute_svd_projection(G, rank=r)
for step in training:
    G = compute_gradient(model)
    G_proj = G @ P  # Project to low rank
    optimizer.step(G_proj)  # Optimizer on reduced space
    if step % update_proj_gap == 0:
        P = recompute_projection(G, rank=r)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment