Implementation:Deepspeedai DeepSpeed CPU Adagrad
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep Learning, CPU Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
CPU implementation of the Adagrad (Adaptive Gradient) optimizer with SIMD acceleration for efficient adaptive learning rate optimization on CPU hardware.
Description
This file implements the Adagrad optimizer with PyTorch bindings and SIMD acceleration support (AVX2/AVX512). Adagrad adapts the learning rate for each parameter based on historical gradient information, making it particularly effective for sparse data. The implementation uses hierarchical step functions (Step_1, Step_4, Step_8) that leverage SIMD operations for performance, with scalar fallback for remaining elements. It supports multiple precision types including FP32, FP16, and BFloat16 for both parameters and optimizer states.
Usage
Use this optimizer when training models with sparse features or when different parameters require different learning rate scales, especially on CPU-based systems.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/adagrad/cpu_adagrad.cpp
Signature
int create_adagrad_optimizer(int optimizer_id,
float alpha = 1e-2,
float eps = 1e-8,
float weight_decay = 0,
bool should_log = false);
int ds_adagrad_step(int optimizer_id,
size_t step,
float lr,
float epsilon,
float weight_decay,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg_sq);
int destroy_adagrad_optimizer(int optimizer_id);
Import
#include "cpu_adagrad.h"
I/O Contract
create_adagrad_optimizer Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Unique identifier for the optimizer instance |
| alpha | float | Learning rate (default: 1e-2) |
| eps | float | Small constant for numerical stability (default: 1e-8) |
| weight_decay | float | Weight decay coefficient (default: 0) |
| should_log | bool | Enable logging of optimizer creation |
ds_adagrad_step Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Optimizer instance identifier |
| step | size_t | Current training step number |
| lr | float | Current learning rate |
| epsilon | float | Numerical stability constant |
| weight_decay | float | Weight decay coefficient |
| params | torch::Tensor& | Model parameters (in/out) |
| grads | torch::Tensor& | Gradients (in) |
| exp_avg_sq | torch::Tensor& | Accumulated squared gradients (in/out) |
Returns
| Function | Return Type | Description |
|---|---|---|
| create_adagrad_optimizer | int | 0 on success |
| ds_adagrad_step | int | 0 on success |
| destroy_adagrad_optimizer | int | 0 on success |
Usage Examples
import torch
import deepspeed
# Create Adagrad optimizer instance
optimizer_id = 0
deepspeed.ops.adagrad.cpu_adagrad.create_adagrad(
optimizer_id,
alpha=0.01,
eps=1e-8,
weight_decay=0.0,
should_log=True
)
# Prepare tensors
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)
# Perform optimizer step
deepspeed.ops.adagrad.cpu_adagrad.adagrad_update(
optimizer_id,
step=1,
lr=0.01,
epsilon=1e-8,
weight_decay=0.0,
params=params,
grads=grads,
exp_avg_sq=exp_avg_sq
)
# Cleanup
deepspeed.ops.adagrad.cpu_adagrad.destroy_adagrad(optimizer_id)