Implementation:Deepspeedai DeepSpeed CPU Adagrad Header
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep Learning, CPU Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Header file defining the SIMD-accelerated Adagrad optimizer class with AVX2/AVX512 support for CPU-based training.
Description
This header defines the Adagrad_Optimizer class with template-based SIMD implementations for efficient adaptive gradient descent on CPU architectures. The class provides Step_AVX template methods that utilize AVX intrinsics for vectorized operations, supporting span factors (1, 4, 8) to handle different batch sizes efficiently. The implementation includes automatic detection of AVX512/AVX256 capabilities and supports mixed precision training with FP16, BFloat16, and FP32 types.
Usage
Include this header when implementing or extending CPU-based Adagrad optimization with SIMD acceleration capabilities.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/includes/cpu_adagrad.h
Signature
class Adagrad_Optimizer {
public:
Adagrad_Optimizer(float alpha = 1e-2, float eps = 1e-8, float weight_decay = 0);
~Adagrad_Optimizer();
#if defined(__AVX512__) or defined(__AVX256__)
template <int span, typename ds_params_precision_t, typename ds_state_precision_t>
void Step_AVX(size_t* rounded_size,
ds_params_precision_t* _params,
ds_params_precision_t* grads,
ds_state_precision_t* _exp_avg_sq,
size_t param_size);
#endif
template <typename ds_params_precision_t, typename ds_state_precision_t>
void Step_1(ds_params_precision_t* _params,
ds_params_precision_t* grads,
ds_state_precision_t* _exp_avg_sq,
size_t _param_size);
template <typename ds_params_precision_t, typename ds_state_precision_t>
void Step_4(ds_params_precision_t* _params,
ds_params_precision_t* grads,
ds_state_precision_t* _exp_avg_sq,
size_t _param_size);
template <typename ds_params_precision_t, typename ds_state_precision_t>
void Step_8(ds_params_precision_t* _params,
ds_params_precision_t* grads,
ds_state_precision_t* _exp_avg_sq,
size_t _param_size);
inline void IncrementStep(size_t step);
inline void update_state(float lr, float epsilon, float weight_decay);
private:
float _alpha;
float _eps;
float _weight_decay;
float _betta1_t;
float _betta2_t;
size_t _step;
};
Import
#include "cpu_adagrad.h"
#include "simd.h"
I/O Contract
Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
| alpha | float | Learning rate (default: 1e-2) |
| eps | float | Small constant for numerical stability (default: 1e-8) |
| weight_decay | float | Weight decay coefficient (default: 0) |
Step_AVX Template Parameters
| Parameter | Type | Description |
|---|---|---|
| span | int | SIMD vector span factor (1, 4, or 8) |
| rounded_size | size_t* | Output: number of elements processed with SIMD |
| _params | ds_params_precision_t* | Model parameters array (in/out) |
| grads | ds_params_precision_t* | Gradients array (in) |
| _exp_avg_sq | ds_state_precision_t* | Accumulated squared gradients (in/out) |
| param_size | size_t | Total number of parameters |
Supported Type Combinations
| Parameters Type | State Type | Description |
|---|---|---|
| c10::Half | float | FP16 parameters, FP32 state |
| c10::Half | c10::Half | FP16 parameters, FP16 state |
| c10::BFloat16 | float | BF16 parameters, FP32 state (AVX512 only) |
| c10::BFloat16 | c10::BFloat16 | BF16 parameters, BF16 state (AVX512 only) |
| float | float | FP32 parameters, FP32 state |
Usage Examples
#include "cpu_adagrad.h"
// Create Adagrad optimizer instance
Adagrad_Optimizer opt(
/* alpha = */ 0.01,
/* eps = */ 1e-8,
/* weight_decay = */ 0.0
);
// Update state for current step
opt.IncrementStep(1);
opt.update_state(0.01, 1e-8, 0.0);
// Execute optimizer step with FP32
size_t param_size = 1024;
float* params = new float[param_size];
float* grads = new float[param_size];
float* exp_avg_sq = new float[param_size];
opt.Step_8(params, grads, exp_avg_sq, param_size);
delete[] params;
delete[] grads;
delete[] exp_avg_sq;