Implementation:OpenRLHF OpenRLHF DeepspeedStrategy setup distributed
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Training_Infrastructure |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for initializing the distributed training backend provided by OpenRLHF's DeepspeedStrategy.
Description
The setup_distributed method on DeepspeedStrategy initializes the NCCL distributed backend, sets CUDA devices, configures random seeds for reproducibility, and creates a 3D device mesh for data/sequence/tensor parallelism. It also computes the gradient accumulation steps from the configured batch sizes and world size.
Usage
Call this method on a strategy object immediately after creating it with get_strategy and before loading any models or data. It must be called exactly once.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/utils/deepspeed/deepspeed.py
- Lines: L79-113
Signature
def setup_distributed(self, timeout=timedelta(minutes=60)) -> None:
"""
Initialize distributed training backend.
Args:
timeout (timedelta): Timeout for distributed initialization.
Default: 60 minutes. Increase for large clusters.
Side Effects:
- Initializes NCCL backend via deepspeed.init_distributed()
- Sets CUDA device based on LOCAL_RANK
- Creates device mesh with (dp, sp, tp) dimensions
- Computes accumulated_gradient from batch sizes
- Sets up ring attention group if ring_attn_size > 1
"""
Import
from openrlhf.utils.deepspeed import DeepspeedStrategy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| timeout | timedelta | No | Distributed init timeout (default 60 min) |
Outputs
| Name | Type | Description |
|---|---|---|
| (side effect) | None | Initializes distributed backend in-place |
| self.world_size | int | Total number of processes |
| self.accumulated_gradient | int | Gradient accumulation steps |
| self.ds_device_mesh | DeviceMesh | 3D (dp, sp, tp) device mesh |
Usage Examples
Standard Setup
from datetime import timedelta
from openrlhf.utils.utils import get_strategy
strategy = get_strategy(args)
strategy.setup_distributed(timeout=timedelta(minutes=60))
# Now ready for model loading and training
print(f"World size: {strategy.world_size}")
print(f"Gradient accumulation: {strategy.accumulated_gradient}")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment