Implementation:Deepspeedai DeepSpeed XPU Adagrad
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep Learning, XPU Computing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Intel XPU (GPU) implementation of the Adagrad optimizer with SIMD acceleration for adaptive learning rate optimization on Intel GPU hardware.
Description
This file implements the Adagrad optimizer specifically for Intel XPU devices, providing an alternative to the standard CPU implementation. It supports both full precision (FP32) and half precision (FP16) training modes, with the ability to maintain higher-precision optimizer states while using lower precision parameters. The implementation includes hierarchical step functions (Step_1, Step_4, Step_8) that leverage SIMD operations for performance. Note that the step_plus_copy function for parameter copying to GPU is not yet implemented (asserts false).
Usage
Use this optimizer when training models with sparse features on Intel XPU devices, or when adaptive learning rates are beneficial for your training task.
Code Reference
Source Location
- Repository: DeepSpeed
- File: csrc/xpu/adagrad/cpu_adagrad.cpp
Signature
int create_adagrad_optimizer(int optimizer_id,
float alpha = 1e-2,
float eps = 1e-8,
float weight_decay = 0,
bool should_log = false);
int ds_adagrad_step(int optimizer_id,
size_t step,
float lr,
float epsilon,
float weight_decay,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg_sq);
int ds_adagrad_step_plus_copy(int optimizer_id,
size_t step,
float lr,
float epsilon,
float weight_decay,
torch::Tensor& params,
torch::Tensor& grads,
torch::Tensor& exp_avg_sq,
torch::Tensor& gpu_params);
int destroy_adagrad_optimizer(int optimizer_id);
Import
#include "cpu_adagrad.h"
I/O Contract
create_adagrad_optimizer Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Unique identifier for the optimizer instance |
| alpha | float | Learning rate (default: 1e-2) |
| eps | float | Small constant for numerical stability (default: 1e-8) |
| weight_decay | float | Weight decay coefficient (default: 0) |
| should_log | bool | Enable logging of optimizer creation |
ds_adagrad_step Parameters
| Parameter | Type | Description |
|---|---|---|
| optimizer_id | int | Optimizer instance identifier |
| step | size_t | Current training step number |
| lr | float | Current learning rate |
| epsilon | float | Numerical stability constant |
| weight_decay | float | Weight decay coefficient |
| params | torch::Tensor& | Model parameters (in/out) |
| grads | torch::Tensor& | Gradients (in) |
| exp_avg_sq | torch::Tensor& | Accumulated squared gradients (in/out) |
Returns
| Function | Return Type | Description |
|---|---|---|
| create_adagrad_optimizer | int | 0 on success |
| ds_adagrad_step | int | 0 on success |
| ds_adagrad_step_plus_copy | int | Not implemented (asserts false) |
| destroy_adagrad_optimizer | int | 0 on success |
Usage Examples
import torch
import deepspeed
# Create XPU Adagrad optimizer instance
optimizer_id = 0
deepspeed.ops.adagrad.xpu_adagrad.create_adagrad(
optimizer_id,
alpha=0.01,
eps=1e-8,
weight_decay=0.0,
should_log=True
)
# Prepare tensors (float32 in this example)
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)
# Perform optimizer step
deepspeed.ops.adagrad.xpu_adagrad.adagrad_update(
optimizer_id,
step=1,
lr=0.01,
epsilon=1e-8,
weight_decay=0.0,
params=params,
grads=grads,
exp_avg_sq=exp_avg_sq
)
# Cleanup
deepspeed.ops.adagrad.xpu_adagrad.destroy_adagrad(optimizer_id)