Implementation:Deepspeedai DeepSpeed CPU Adam Impl
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep Learning, CPU Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
CPU-optimized implementation of the Adam and AdamW optimizer with SIMD acceleration for efficient neural network training on CPU hardware.
Description
This file provides the C++ implementation of the Adam optimizer with PyTorch bindings, featuring AVX2/AVX512 SIMD acceleration for high-performance training on CPU. It implements both Adam (with L2 regularization) and AdamW (decoupled weight decay) modes, supporting multiple precision types including FP32, FP16, and BFloat16. The implementation includes hierarchical step functions (Step_1, Step_4, Step_8) that progressively handle larger batches with SIMD operations, falling back to scalar operations for remaining elements. A unique rollback feature allows reverting optimizer steps for advanced training scenarios.
Usage
Use this optimizer when training neural networks on CPU-only systems or when CPU offloading is required for memory efficiency in large model training scenarios.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/adam/cpu_adam_impl.cpp
Signature
int create_adam_optimizer(int optimizer_id,
float alpha,
float betta1,
float betta2,
float eps,
float weight_decay,
bool adamw_mode,
bool should_log);
int ds_adam_step(int optimizer_id,
size_t step,
float lr,
float beta1,
float beta2,
float epsilon,
float weight_decay,
bool bias_correction,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg,
torch::Tensor& exp_avg_sq);
int ds_adam_rollback(int optimizer_id,
size_t step,
float lr,
float beta1,
float beta2,
float epsilon,
float weight_decay,
bool bias_correction,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg,
torch::Tensor& exp_avg_sq);
int destroy_adam_optimizer(int optimizer_id);
Import
#include "cpu_adam.h"
I/O Contract
create_adam_optimizer Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Unique identifier for the optimizer instance |
| alpha | float | Learning rate (default: 1e-3) |
| betta1 | float | Exponential decay rate for first moment (default: 0.9) |
| betta2 | float | Exponential decay rate for second moment (default: 0.999) |
| eps | float | Small constant for numerical stability (default: 1e-8) |
| weight_decay | float | Weight decay coefficient (default: 0) |
| adamw_mode | bool | Use AdamW (decoupled weight decay) if true, Adam if false |
| should_log | bool | Enable logging of optimizer creation |
ds_adam_step Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Optimizer instance identifier |
| step | size_t | Current training step number |
| lr | float | Current learning rate |
| beta1 | float | First moment decay rate |
| beta2 | float | Second moment decay rate |
| epsilon | float | Numerical stability constant |
| weight_decay | float | Weight decay coefficient |
| bias_correction | bool | Apply bias correction |
| params | torch::Tensor& | Model parameters (in/out) |
| grads | torch::Tensor& | Gradients (in) |
| exp_avg | torch::Tensor& | First moment estimates (in/out) |
| exp_avg_sq | torch::Tensor& | Second moment estimates (in/out) |
Returns
| Function | Return Type | Description |
|---|---|---|
| create_adam_optimizer | int | 0 on success |
| ds_adam_step | int | 0 on success |
| ds_adam_rollback | int | 0 on success, -1 on error |
| destroy_adam_optimizer | int | 0 on success |
Usage Examples
import torch
import deepspeed
# Create optimizer instance
optimizer_id = 0
deepspeed.ops.adam.cpu_adam.create_adam(
optimizer_id,
lr=0.001,
betta1=0.9,
betta2=0.999,
eps=1e-8,
weight_decay=0.01,
adamw_mode=True,
should_log=True
)
# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg = torch.zeros(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)
# Perform optimizer step
deepspeed.ops.adam.cpu_adam.adam_update(
optimizer_id,
step=1,
lr=0.001,
beta1=0.9,
beta2=0.999,
epsilon=1e-8,
weight_decay=0.01,
bias_correction=True,
params=params,
grads=grads,
exp_avg=exp_avg,
exp_avg_sq=exp_avg_sq
)
# Cleanup
deepspeed.ops.adam.cpu_adam.destroy_adam(optimizer_id)