Implementation:Allenai Open instruct Save Final Model

Type	Function
Source	`open_instruct/grpo_fast.py:L1697-1722`
Dependencies	ray, torch, transformers, huggingface_hub
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete function for saving the final trained GRPO model and optionally launching downstream evaluation jobs, provided by the Open Instruct library.

Description

save_final_model() is called at the end of GRPO training to persist the final policy model. It:

Logs the final training step and output directory.
Issues parallel save_model.remote() calls to all learner Ray actors, which:

  * Gather model weights (if DeepSpeed stage 3).
  * Save model weights in HuggingFace format (model.safetensors).
  * Save the tokenizer and chat template configuration.
  * Only rank 0 actually writes to disk; other ranks participate in the all-gather.

Waits for all save operations to complete with a progress bar.
If try_launch_beaker_eval_jobs_on_weka=True and running on Beaker, launches evaluation jobs for the saved model using the configured evaluation tasks and workspace.

The function uses ray_get_with_progress() to provide a visual progress indicator during the potentially slow save operation (especially for large models on networked storage).

Usage

Called exactly once at the end of training, after the last training step completes. This is separate from intermediate checkpoint saves (handled by maybe_save_checkpoint()), which save at regular intervals during training.

Code Reference

Source Location

Repository: Open Instruct
File: open_instruct/grpo_fast.py

Signature

def save_final_model(
    args: grpo_utils.ExperimentConfig,
    policy_group: ModelGroup,
    tokenizer: PreTrainedTokenizer,
    training_step: int,
    wandb_url: str,
    chat_template_name: str,
) -> None:

Import

from open_instruct.grpo_fast import save_final_model

I/O Contract

Inputs

Name	Type	Description
`args`	`ExperimentConfig`	Experiment config providing `output_dir`, `try_launch_beaker_eval_jobs_on_weka`, `hf_repo_revision`, and `world_size`.
`policy_group`	`ModelGroup`	Group of Ray actor handles for all learner processes.
`tokenizer`	`PreTrainedTokenizer`	Tokenizer to save alongside the model.
`training_step`	`int`	Final training step number (for logging).
`wandb_url`	`str`	Weights & Biases run URL (passed to evaluation jobs for linking).
`chat_template_name`	`str`	Name of the chat template to include in saved model configuration.

Outputs

Name	Type	Description
Model files (side effect)	Files on disk	Model weights (`model.safetensors`), tokenizer files, and configuration are written to `args.output_dir`.
Evaluation jobs (side effect)	Beaker jobs	Optional: Evaluation jobs are launched on the Beaker platform.

Usage Examples

from open_instruct.grpo_fast import save_final_model

# At the end of the training loop:
save_final_model(
    args=experiment_config,
    policy_group=policy_group,
    tokenizer=tokenizer,
    training_step=1000,
    wandb_url="https://wandb.ai/team/project/runs/abc123",
    chat_template_name="tulu",
)

# The model is now saved to experiment_config.output_dir
# e.g., /output/grpo_olmo_7b/
# Contents:
#   model.safetensors
#   config.json
#   tokenizer.json
#   tokenizer_config.json
#   special_tokens_map.json

Related Pages

Implements Principle

Principle:Allenai_Open_instruct_GRPO_Checkpointing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment