Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed CPU Adam Impl

From Leeroopedia


Knowledge Sources
Domains Optimization, Deep Learning, CPU Computing
Last Updated 2026-02-09 00:00 GMT

Overview

CPU-optimized implementation of the Adam and AdamW optimizer with SIMD acceleration for efficient neural network training on CPU hardware.

Description

This file provides the C++ implementation of the Adam optimizer with PyTorch bindings, featuring AVX2/AVX512 SIMD acceleration for high-performance training on CPU. It implements both Adam (with L2 regularization) and AdamW (decoupled weight decay) modes, supporting multiple precision types including FP32, FP16, and BFloat16. The implementation includes hierarchical step functions (Step_1, Step_4, Step_8) that progressively handle larger batches with SIMD operations, falling back to scalar operations for remaining elements. A unique rollback feature allows reverting optimizer steps for advanced training scenarios.

Usage

Use this optimizer when training neural networks on CPU-only systems or when CPU offloading is required for memory efficiency in large model training scenarios.

Code Reference

Source Location

Signature

int create_adam_optimizer(int optimizer_id,
                          float alpha,
                          float betta1,
                          float betta2,
                          float eps,
                          float weight_decay,
                          bool adamw_mode,
                          bool should_log);

int ds_adam_step(int optimizer_id,
                 size_t step,
                 float lr,
                 float beta1,
                 float beta2,
                 float epsilon,
                 float weight_decay,
                 bool bias_correction,
                 torch::Tensor& params,
                 torch::Tensor& grads,
                 torch::Tensor& exp_avg,
                 torch::Tensor& exp_avg_sq);

int ds_adam_rollback(int optimizer_id,
                     size_t step,
                     float lr,
                     float beta1,
                     float beta2,
                     float epsilon,
                     float weight_decay,
                     bool bias_correction,
                     torch::Tensor& params,
                     torch::Tensor& grads,
                     torch::Tensor& exp_avg,
                     torch::Tensor& exp_avg_sq);

int destroy_adam_optimizer(int optimizer_id);

Import

#include "cpu_adam.h"

I/O Contract

create_adam_optimizer Parameters

Parameter Type Description
optimizer_id int Unique identifier for the optimizer instance
alpha float Learning rate (default: 1e-3)
betta1 float Exponential decay rate for first moment (default: 0.9)
betta2 float Exponential decay rate for second moment (default: 0.999)
eps float Small constant for numerical stability (default: 1e-8)
weight_decay float Weight decay coefficient (default: 0)
adamw_mode bool Use AdamW (decoupled weight decay) if true, Adam if false
should_log bool Enable logging of optimizer creation

ds_adam_step Parameters

Parameter Type Description
optimizer_id int Optimizer instance identifier
step size_t Current training step number
lr float Current learning rate
beta1 float First moment decay rate
beta2 float Second moment decay rate
epsilon float Numerical stability constant
weight_decay float Weight decay coefficient
bias_correction bool Apply bias correction
params torch::Tensor& Model parameters (in/out)
grads torch::Tensor& Gradients (in)
exp_avg torch::Tensor& First moment estimates (in/out)
exp_avg_sq torch::Tensor& Second moment estimates (in/out)

Returns

Function Return Type Description
create_adam_optimizer int 0 on success
ds_adam_step int 0 on success
ds_adam_rollback int 0 on success, -1 on error
destroy_adam_optimizer int 0 on success

Usage Examples

import torch
import deepspeed

# Create optimizer instance
optimizer_id = 0
deepspeed.ops.adam.cpu_adam.create_adam(
    optimizer_id,
    lr=0.001,
    betta1=0.9,
    betta2=0.999,
    eps=1e-8,
    weight_decay=0.01,
    adamw_mode=True,
    should_log=True
)

# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg = torch.zeros(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)

# Perform optimizer step
deepspeed.ops.adam.cpu_adam.adam_update(
    optimizer_id,
    step=1,
    lr=0.001,
    beta1=0.9,
    beta2=0.999,
    epsilon=1e-8,
    weight_decay=0.01,
    bias_correction=True,
    params=params,
    grads=grads,
    exp_avg=exp_avg,
    exp_avg_sq=exp_avg_sq
)

# Cleanup
deepspeed.ops.adam.cpu_adam.destroy_adam(optimizer_id)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment