Implementation:Deepspeedai DeepSpeed CPU Lion Impl
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep Learning, CPU Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
CPU implementation of the Lion (EvoLved Sign Momentum) optimizer with SIMD acceleration for memory-efficient neural network training on CPU hardware.
Description
This file provides the C++ implementation of the Lion optimizer with PyTorch bindings, featuring AVX2/AVX512 SIMD acceleration. Lion uses sign-based momentum updates, computing the sign of an interpolation between the current gradient and momentum, then applying it with the learning rate. This approach requires only one momentum buffer (vs. two for Adam), reducing memory usage by ~50% while maintaining competitive performance. The implementation uses hierarchical step functions with SIMD operations and portable sign manipulation via std::copysignf for the scalar fallback path.
Usage
Use this optimizer when training neural networks on CPU systems where memory efficiency is important, or when seeking an alternative to Adam with simpler hyperparameter tuning.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/lion/cpu_lion_impl.cpp
Signature
int create_lion_optimizer(int optimizer_id,
float alpha,
float betta1,
float betta2,
float weight_decay,
bool should_log);
int ds_lion_step(int optimizer_id,
size_t step,
float lr,
float beta1,
float beta2,
float weight_decay,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg);
int destroy_lion_optimizer(int optimizer_id);
Import
#include "cpu_lion.h"
I/O Contract
create_lion_optimizer Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Unique identifier for the optimizer instance |
| alpha | float | Learning rate (default: 1e-3) |
| betta1 | float | Interpolation coefficient for update direction (default: 0.9) |
| betta2 | float | Momentum decay rate for EMA (default: 0.999) |
| weight_decay | float | Weight decay coefficient (default: 0) |
| should_log | bool | Enable logging of optimizer creation |
ds_lion_step Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Optimizer instance identifier |
| step | size_t | Current training step number |
| lr | float | Current learning rate |
| beta1 | float | Interpolation coefficient for update |
| beta2 | float | Momentum decay rate |
| weight_decay | float | Weight decay coefficient |
| params | torch::Tensor& | Model parameters (in/out) |
| grads | torch::Tensor& | Gradients (in) |
| exp_avg | torch::Tensor& | Momentum buffer (in/out) |
Returns
| Function | Return Type | Description |
|---|---|---|
| create_lion_optimizer | int | 0 on success |
| ds_lion_step | int | 0 on success |
| destroy_lion_optimizer | int | 0 on success |
Usage Examples
import torch
import deepspeed
# Create Lion optimizer instance
optimizer_id = 0
deepspeed.ops.lion.cpu_lion.create_lion(
optimizer_id,
alpha=0.001,
betta1=0.9,
betta2=0.999,
weight_decay=0.01,
should_log=True
)
# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg = torch.zeros(1000, dtype=torch.float32)
# Perform optimizer step
deepspeed.ops.lion.cpu_lion.lion_update(
optimizer_id,
step=1,
lr=0.001,
beta1=0.9,
beta2=0.999,
weight_decay=0.01,
params=params,
grads=grads,
exp_avg=exp_avg
)
# Cleanup
deepspeed.ops.lion.cpu_lion.destroy_lion(optimizer_id)