Implementation:OpenRLHF OpenRLHF DeepspeedStrategy create optimizer

Knowledge Sources	OpenRLHF DeepSpeed Optimizers
Domains	Optimization, Training_Infrastructure
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for creating hardware-optimized Adam optimizers for DeepSpeed training provided by OpenRLHF.

Description

The create_optimizer method on DeepspeedStrategy creates a FusedAdam or DeepSpeedCPUAdam optimizer based on the adam_offload configuration flag. It applies weight decay grouping (excluding bias and layernorm parameters) via get_optimizer_grouped_parameters.

Usage

Call on the strategy object after model loading. Pass the result to the trainer constructor along with a scheduler from get_scheduler.

Code Reference

Source Location

Repository: OpenRLHF
File: openrlhf/utils/deepspeed/deepspeed.py
Lines: L134-141

Signature

def create_optimizer(self, model, **kwargs) -> Optimizer:
    """
    Create a DeepSpeed Adam optimizer.

    Args:
        model: nn.Module or Actor - the model to optimize
        **kwargs: Passed to Adam, typically:
            - lr (float): Learning rate
            - betas (tuple): Adam betas (default (0.9, 0.95))
            - weight_decay (float): L2 regularization

    Returns:
        Optimizer: FusedAdam or DeepSpeedCPUAdam
    """

Import

from openrlhf.utils.deepspeed import DeepspeedStrategy

I/O Contract

Inputs

Name	Type	Required	Description
model	nn.Module or Actor	Yes	Model whose parameters to optimize
lr	float	Yes	Learning rate (via kwargs)
weight_decay	float	No	Weight decay coefficient (via kwargs)

Outputs

Name	Type	Description
optimizer	Optimizer	FusedAdam or DeepSpeedCPUAdam instance

Usage Examples

# Create optimizer
optim = strategy.create_optimizer(
    model,
    lr=args.learning_rate,
    betas=(0.9, 0.95),
    weight_decay=args.l2,
)

# Create scheduler
from transformers.trainer import get_scheduler
scheduler = get_scheduler(
    "cosine_with_min_lr",
    optim,
    num_warmup_steps=math.ceil(max_steps * 0.03),
    num_training_steps=max_steps,
    scheduler_specific_kwargs={"min_lr": args.learning_rate * 0.1},
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment