Implementation:OpenRLHF OpenRLHF DeepspeedStrategy save model
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Training_Infrastructure, Distributed_Computing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for saving models from distributed DeepSpeed training provided by OpenRLHF.
Description
The save_model method on DeepspeedStrategy handles the complex process of saving models from ZeRO-sharded distributed training. It gathers parameters from all ranks, extracts LoRA adapter weights if applicable, and saves in HuggingFace format. For ZeRO-3, it uses deepspeed.zero.GatheredParameters to temporarily materialize parameters on rank 0.
Usage
Call at the end of training or at checkpoint intervals to save the model. Typically invoked by the trainer's save_logs_and_checkpoints method.
Code Reference
Source Location
- Repository: OpenRLHF
- File: openrlhf/utils/deepspeed/deepspeed.py
- Lines: L349-407
Signature
def save_model(
self,
model: nn.Module, # The model to save
tokenizer, # Tokenizer to save alongside
output_dir: str, # Directory to save to
**kwargs,
) -> None:
"""
Save model and tokenizer to output_dir.
Handles:
- ZeRO-3 parameter gathering across ranks
- LoRA adapter extraction via PEFT
- HuggingFace-compatible save format
- Only rank 0 performs disk I/O
"""
Import
from openrlhf.utils.deepspeed import DeepspeedStrategy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | Trained model (Actor or reward model) |
| tokenizer | PreTrainedTokenizer | Yes | Tokenizer associated with the model |
| output_dir | str | Yes | Path to save directory |
Outputs
| Name | Type | Description |
|---|---|---|
| model files | Files | HuggingFace model weights on disk |
| tokenizer files | Files | Tokenizer config and vocab on disk |
Usage Examples
# Save at end of training
strategy.save_model(
model=actor_model,
tokenizer=tokenizer,
output_dir=args.save_path,
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment