Implementation:Allenai Open instruct Save Final Model
| Type | Function |
|---|---|
| Source | open_instruct/grpo_fast.py:L1697-1722
|
| Dependencies | ray, torch, transformers, huggingface_hub |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete function for saving the final trained GRPO model and optionally launching downstream evaluation jobs, provided by the Open Instruct library.
Description
save_final_model() is called at the end of GRPO training to persist the final policy model. It:
- Logs the final training step and output directory.
- Issues parallel
save_model.remote()calls to all learner Ray actors, which:
* Gather model weights (if DeepSpeed stage 3).
* Save model weights in HuggingFace format (model.safetensors).
* Save the tokenizer and chat template configuration.
* Only rank 0 actually writes to disk; other ranks participate in the all-gather.
- Waits for all save operations to complete with a progress bar.
- If
try_launch_beaker_eval_jobs_on_weka=Trueand running on Beaker, launches evaluation jobs for the saved model using the configured evaluation tasks and workspace.
The function uses ray_get_with_progress() to provide a visual progress indicator during the potentially slow save operation (especially for large models on networked storage).
Usage
Called exactly once at the end of training, after the last training step completes. This is separate from intermediate checkpoint saves (handled by maybe_save_checkpoint()), which save at regular intervals during training.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/grpo_fast.py
Signature
def save_final_model(
args: grpo_utils.ExperimentConfig,
policy_group: ModelGroup,
tokenizer: PreTrainedTokenizer,
training_step: int,
wandb_url: str,
chat_template_name: str,
) -> None:
Import
from open_instruct.grpo_fast import save_final_model
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
args |
ExperimentConfig |
Experiment config providing output_dir, try_launch_beaker_eval_jobs_on_weka, hf_repo_revision, and world_size.
|
policy_group |
ModelGroup |
Group of Ray actor handles for all learner processes. |
tokenizer |
PreTrainedTokenizer |
Tokenizer to save alongside the model. |
training_step |
int |
Final training step number (for logging). |
wandb_url |
str |
Weights & Biases run URL (passed to evaluation jobs for linking). |
chat_template_name |
str |
Name of the chat template to include in saved model configuration. |
Outputs
| Name | Type | Description |
|---|---|---|
| Model files (side effect) | Files on disk | Model weights (model.safetensors), tokenizer files, and configuration are written to args.output_dir.
|
| Evaluation jobs (side effect) | Beaker jobs | Optional: Evaluation jobs are launched on the Beaker platform. |
Usage Examples
from open_instruct.grpo_fast import save_final_model
# At the end of the training loop:
save_final_model(
args=experiment_config,
policy_group=policy_group,
tokenizer=tokenizer,
training_step=1000,
wandb_url="https://wandb.ai/team/project/runs/abc123",
chat_template_name="tulu",
)
# The model is now saved to experiment_config.output_dir
# e.g., /output/grpo_olmo_7b/
# Contents:
# model.safetensors
# config.json
# tokenizer.json
# tokenizer_config.json
# special_tokens_map.json