Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed XPU Adagrad

From Leeroopedia


Knowledge Sources
Domains Optimization, Deep Learning, XPU Computing
Last Updated 2026-02-09 00:00 GMT

Overview

Intel XPU (GPU) implementation of the Adagrad optimizer with SIMD acceleration for adaptive learning rate optimization on Intel GPU hardware.

Description

This file implements the Adagrad optimizer specifically for Intel XPU devices, providing an alternative to the standard CPU implementation. It supports both full precision (FP32) and half precision (FP16) training modes, with the ability to maintain higher-precision optimizer states while using lower precision parameters. The implementation includes hierarchical step functions (Step_1, Step_4, Step_8) that leverage SIMD operations for performance. Note that the step_plus_copy function for parameter copying to GPU is not yet implemented (asserts false).

Usage

Use this optimizer when training models with sparse features on Intel XPU devices, or when adaptive learning rates are beneficial for your training task.

Code Reference

Source Location

Signature

int create_adagrad_optimizer(int optimizer_id,
                             float alpha = 1e-2,
                             float eps = 1e-8,
                             float weight_decay = 0,
                             bool should_log = false);

int ds_adagrad_step(int optimizer_id,
                    size_t step,
                    float lr,
                    float epsilon,
                    float weight_decay,
                    torch::Tensor& params,
                    torch::Tensor& grads,
                    torch::Tensor& exp_avg_sq);

int ds_adagrad_step_plus_copy(int optimizer_id,
                              size_t step,
                              float lr,
                              float epsilon,
                              float weight_decay,
                              torch::Tensor& params,
                              torch::Tensor& grads,
                              torch::Tensor& exp_avg_sq,
                              torch::Tensor& gpu_params);

int destroy_adagrad_optimizer(int optimizer_id);

Import

#include "cpu_adagrad.h"

I/O Contract

create_adagrad_optimizer Parameters

Parameter Type Description
optimizer_id int Unique identifier for the optimizer instance
alpha float Learning rate (default: 1e-2)
eps float Small constant for numerical stability (default: 1e-8)
weight_decay float Weight decay coefficient (default: 0)
should_log bool Enable logging of optimizer creation

ds_adagrad_step Parameters

Parameter Type Description
optimizer_id int Optimizer instance identifier
step size_t Current training step number
lr float Current learning rate
epsilon float Numerical stability constant
weight_decay float Weight decay coefficient
params torch::Tensor& Model parameters (in/out)
grads torch::Tensor& Gradients (in)
exp_avg_sq torch::Tensor& Accumulated squared gradients (in/out)

Returns

Function Return Type Description
create_adagrad_optimizer int 0 on success
ds_adagrad_step int 0 on success
ds_adagrad_step_plus_copy int Not implemented (asserts false)
destroy_adagrad_optimizer int 0 on success

Usage Examples

import torch
import deepspeed

# Create XPU Adagrad optimizer instance
optimizer_id = 0
deepspeed.ops.adagrad.xpu_adagrad.create_adagrad(
    optimizer_id,
    alpha=0.01,
    eps=1e-8,
    weight_decay=0.0,
    should_log=True
)

# Prepare tensors (float32 in this example)
params = torch.randn(1000, dtype=torch.float32)
grads = torch.randn(1000, dtype=torch.float32)
exp_avg_sq = torch.zeros(1000, dtype=torch.float32)

# Perform optimizer step
deepspeed.ops.adagrad.xpu_adagrad.adagrad_update(
    optimizer_id,
    step=1,
    lr=0.01,
    epsilon=1e-8,
    weight_decay=0.0,
    params=params,
    grads=grads,
    exp_avg_sq=exp_avg_sq
)

# Cleanup
deepspeed.ops.adagrad.xpu_adagrad.destroy_adagrad(optimizer_id)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment