Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed CPU Adagrad

From Leeroopedia


Knowledge Sources
Domains Optimization, Deep Learning, CPU Computing
Last Updated 2026-02-09 00:00 GMT

Overview

CPU implementation of the Adagrad (Adaptive Gradient) optimizer with SIMD acceleration for efficient adaptive learning rate optimization on CPU hardware.

Description

This file implements the Adagrad optimizer with PyTorch bindings and SIMD acceleration support (AVX2/AVX512). Adagrad adapts the learning rate for each parameter based on historical gradient information, making it particularly effective for sparse data. The implementation uses hierarchical step functions (Step_1, Step_4, Step_8) that leverage SIMD operations for performance, with scalar fallback for remaining elements. It supports multiple precision types including FP32, FP16, and BFloat16 for both parameters and optimizer states.

Usage

Use this optimizer when training models with sparse features or when different parameters require different learning rate scales, especially on CPU-based systems.

Code Reference

Source Location

Signature

int create_adagrad_optimizer(int optimizer_id,
                             float alpha = 1e-2,
                             float eps = 1e-8,
                             float weight_decay = 0,
                             bool should_log = false);

int ds_adagrad_step(int optimizer_id,
                    size_t step,
                    float lr,
                    float epsilon,
                    float weight_decay,
                    torch::Tensor& params,
                    torch::Tensor& grads,
                    torch::Tensor& exp_avg_sq);

int destroy_adagrad_optimizer(int optimizer_id);

Import

#include "cpu_adagrad.h"

I/O Contract

create_adagrad_optimizer Parameters

Parameter Type Description
optimizer_id int Unique identifier for the optimizer instance
alpha float Learning rate (default: 1e-2)
eps float Small constant for numerical stability (default: 1e-8)
weight_decay float Weight decay coefficient (default: 0)
should_log bool Enable logging of optimizer creation

ds_adagrad_step Parameters

Parameter Type Description
optimizer_id int Optimizer instance identifier
step size_t Current training step number
lr float Current learning rate
epsilon float Numerical stability constant
weight_decay float Weight decay coefficient
params torch::Tensor& Model parameters (in/out)
grads torch::Tensor& Gradients (in)
exp_avg_sq torch::Tensor& Accumulated squared gradients (in/out)

Returns

Function Return Type Description
create_adagrad_optimizer int 0 on success
ds_adagrad_step int 0 on success
destroy_adagrad_optimizer int 0 on success

Usage Examples

import torch
import deepspeed

# Create Adagrad optimizer instance
optimizer_id = 0
deepspeed.ops.adagrad.cpu_adagrad.create_adagrad(
    optimizer_id,
    alpha=0.01,
    eps=1e-8,
    weight_decay=0.0,
    should_log=True
)

# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)

# Perform optimizer step
deepspeed.ops.adagrad.cpu_adagrad.adagrad_update(
    optimizer_id,
    step=1,
    lr=0.01,
    epsilon=1e-8,
    weight_decay=0.0,
    params=params,
    grads=grads,
    exp_avg_sq=exp_avg_sq
)

# Cleanup
deepspeed.ops.adagrad.cpu_adagrad.destroy_adagrad(optimizer_id)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment