Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct Save Final Model

From Leeroopedia


Type Function
Source open_instruct/grpo_fast.py:L1697-1722
Dependencies ray, torch, transformers, huggingface_hub
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete function for saving the final trained GRPO model and optionally launching downstream evaluation jobs, provided by the Open Instruct library.

Description

save_final_model() is called at the end of GRPO training to persist the final policy model. It:

  1. Logs the final training step and output directory.
  2. Issues parallel save_model.remote() calls to all learner Ray actors, which:
  * Gather model weights (if DeepSpeed stage 3).
  * Save model weights in HuggingFace format (model.safetensors).
  * Save the tokenizer and chat template configuration.
  * Only rank 0 actually writes to disk; other ranks participate in the all-gather.
  1. Waits for all save operations to complete with a progress bar.
  2. If try_launch_beaker_eval_jobs_on_weka=True and running on Beaker, launches evaluation jobs for the saved model using the configured evaluation tasks and workspace.

The function uses ray_get_with_progress() to provide a visual progress indicator during the potentially slow save operation (especially for large models on networked storage).

Usage

Called exactly once at the end of training, after the last training step completes. This is separate from intermediate checkpoint saves (handled by maybe_save_checkpoint()), which save at regular intervals during training.

Code Reference

Source Location

Signature

def save_final_model(
    args: grpo_utils.ExperimentConfig,
    policy_group: ModelGroup,
    tokenizer: PreTrainedTokenizer,
    training_step: int,
    wandb_url: str,
    chat_template_name: str,
) -> None:

Import

from open_instruct.grpo_fast import save_final_model

I/O Contract

Inputs

Name Type Description
args ExperimentConfig Experiment config providing output_dir, try_launch_beaker_eval_jobs_on_weka, hf_repo_revision, and world_size.
policy_group ModelGroup Group of Ray actor handles for all learner processes.
tokenizer PreTrainedTokenizer Tokenizer to save alongside the model.
training_step int Final training step number (for logging).
wandb_url str Weights & Biases run URL (passed to evaluation jobs for linking).
chat_template_name str Name of the chat template to include in saved model configuration.

Outputs

Name Type Description
Model files (side effect) Files on disk Model weights (model.safetensors), tokenizer files, and configuration are written to args.output_dir.
Evaluation jobs (side effect) Beaker jobs Optional: Evaluation jobs are launched on the Beaker platform.

Usage Examples

from open_instruct.grpo_fast import save_final_model

# At the end of the training loop:
save_final_model(
    args=experiment_config,
    policy_group=policy_group,
    tokenizer=tokenizer,
    training_step=1000,
    wandb_url="https://wandb.ai/team/project/runs/abc123",
    chat_template_name="tulu",
)

# The model is now saved to experiment_config.output_dir
# e.g., /output/grpo_olmo_7b/
# Contents:
#   model.safetensors
#   config.json
#   tokenizer.json
#   tokenizer_config.json
#   special_tokens_map.json

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment