Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed CPU Lion Impl

From Leeroopedia


Knowledge Sources
Domains Optimization, Deep Learning, CPU Computing
Last Updated 2026-02-09 00:00 GMT

Overview

CPU implementation of the Lion (EvoLved Sign Momentum) optimizer with SIMD acceleration for memory-efficient neural network training on CPU hardware.

Description

This file provides the C++ implementation of the Lion optimizer with PyTorch bindings, featuring AVX2/AVX512 SIMD acceleration. Lion uses sign-based momentum updates, computing the sign of an interpolation between the current gradient and momentum, then applying it with the learning rate. This approach requires only one momentum buffer (vs. two for Adam), reducing memory usage by ~50% while maintaining competitive performance. The implementation uses hierarchical step functions with SIMD operations and portable sign manipulation via std::copysignf for the scalar fallback path.

Usage

Use this optimizer when training neural networks on CPU systems where memory efficiency is important, or when seeking an alternative to Adam with simpler hyperparameter tuning.

Code Reference

Source Location

Signature

int create_lion_optimizer(int optimizer_id,
                          float alpha,
                          float betta1,
                          float betta2,
                          float weight_decay,
                          bool should_log);

int ds_lion_step(int optimizer_id,
                 size_t step,
                 float lr,
                 float beta1,
                 float beta2,
                 float weight_decay,
                 torch::Tensor& params,
                 torch::Tensor& grads,
                 torch::Tensor& exp_avg);

int destroy_lion_optimizer(int optimizer_id);

Import

#include "cpu_lion.h"

I/O Contract

create_lion_optimizer Parameters

Parameter Type Description
optimizer_id int Unique identifier for the optimizer instance
alpha float Learning rate (default: 1e-3)
betta1 float Interpolation coefficient for update direction (default: 0.9)
betta2 float Momentum decay rate for EMA (default: 0.999)
weight_decay float Weight decay coefficient (default: 0)
should_log bool Enable logging of optimizer creation

ds_lion_step Parameters

Parameter Type Description
optimizer_id int Optimizer instance identifier
step size_t Current training step number
lr float Current learning rate
beta1 float Interpolation coefficient for update
beta2 float Momentum decay rate
weight_decay float Weight decay coefficient
params torch::Tensor& Model parameters (in/out)
grads torch::Tensor& Gradients (in)
exp_avg torch::Tensor& Momentum buffer (in/out)

Returns

Function Return Type Description
create_lion_optimizer int 0 on success
ds_lion_step int 0 on success
destroy_lion_optimizer int 0 on success

Usage Examples

import torch
import deepspeed

# Create Lion optimizer instance
optimizer_id = 0
deepspeed.ops.lion.cpu_lion.create_lion(
    optimizer_id,
    alpha=0.001,
    betta1=0.9,
    betta2=0.999,
    weight_decay=0.01,
    should_log=True
)

# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg = torch.zeros(1000, dtype=torch.float32)

# Perform optimizer step
deepspeed.ops.lion.cpu_lion.lion_update(
    optimizer_id,
    step=1,
    lr=0.001,
    beta1=0.9,
    beta2=0.999,
    weight_decay=0.01,
    params=params,
    grads=grads,
    exp_avg=exp_avg
)

# Cleanup
deepspeed.ops.lion.cpu_lion.destroy_lion(optimizer_id)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment