Implementation:Huggingface Trl GRPOTrainer Save Model

Property	Value
Implementation Name	GRPOTrainer Save Model
Library	Huggingface TRL
Type	External Tool Doc
Source Files	`trl/trainer/grpo_trainer.py` (L2213-2219), `trl/scripts/grpo.py` (L168-172)
Import	`from trl import GRPOTrainer`

Overview

Description

The GRPOTrainer inherits model saving functionality from the Hugging Face Trainer class, with one key override: the _save_checkpoint method generates a model card before delegating to the parent implementation. The GRPO script orchestrates the full save-and-publish workflow after training completes.

Usage

from trl import GRPOTrainer, GRPOConfig

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(output_dir="./output", push_to_hub=True, hub_model_id="my-org/my-grpo-model"),
    train_dataset=dataset,
)

# Training
trainer.train()

# Save final model
trainer.save_model("./output")

# Push to Hub (optional)
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")

Code Reference

Source Location

Method	File	Lines
`_save_checkpoint`	`trl/trainer/grpo_trainer.py`	L2213-2219
Save and push in GRPO script	`trl/scripts/grpo.py`	L168-172
`save_model`	Inherited from `transformers.Trainer`	--
`push_to_hub`	Inherited from `transformers.Trainer`	--

Signature

# GRPOTrainer override
def _save_checkpoint(self, model, trial):
    """
    Creates a model card with the GRPO-specific metadata before
    delegating to the parent Trainer._save_checkpoint.

    The model card includes:
    - Model name (from output_dir or hub_model_id)
    - TRL and GRPO tags
    - Paper citation for DeepSeekMath
    """
    if self.args.hub_model_id is None:
        model_name = Path(self.args.output_dir).name
    else:
        model_name = self.args.hub_model_id.split("/")[-1]
    self.create_model_card(model_name=model_name)
    super()._save_checkpoint(model, trial)

# Inherited methods (from transformers.Trainer)
def save_model(self, output_dir: str | None = None) -> None:
    """
    Save model weights, tokenizer, and training arguments to the output directory.
    For PEFT models, only adapter weights are saved.
    """

def push_to_hub(self, **kwargs) -> str:
    """
    Upload the model, tokenizer, and model card to the Hugging Face Hub.
    Returns the URL of the published model.
    """

GRPO script save-and-publish workflow:

# trl/scripts/grpo.py L167-173
# Train the model
trainer.train()

# Save and push to Hub
trainer.save_model(training_args.output_dir)

if training_args.push_to_hub:
    trainer.push_to_hub(dataset_name=script_args.dataset_name)

Import

from trl import GRPOTrainer

I/O Contract

Inputs (save_model)

Parameter	Type	Description
`output_dir`	None	Directory to save the model to. If `None`, uses `args.output_dir`.

Outputs (save_model)

Output	Type	Description
Saved files	Files on disk	Model weights (`model.safetensors` or adapter files), tokenizer files, `training_args.bin`, and model card (`README.md`).

Inputs (push_to_hub)

Parameter	Type	Description
`**kwargs`	Any	Keyword arguments forwarded to the Hub upload. Common: `dataset_name` for model card metadata.

Outputs (push_to_hub)

Output	Type	Description
URL	`str`	The URL of the model on the Hugging Face Hub (e.g., `https://huggingface.co/my-org/my-model`).

Usage Examples

Save locally after training:

trainer.train()
trainer.save_model("./my_grpo_model")
# Saves: ./my_grpo_model/model.safetensors
#         ./my_grpo_model/tokenizer.json
#         ./my_grpo_model/config.json
#         ./my_grpo_model/README.md (model card)

Save and push to Hub:

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        output_dir="./output",
        push_to_hub=True,
        hub_model_id="my-org/Qwen2.5-7B-GRPO-Math",
    ),
    train_dataset=dataset,
)
trainer.train()
trainer.save_model("./output")
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")
# Model published at: https://huggingface.co/my-org/Qwen2.5-7B-GRPO-Math

PEFT adapter saving:

from peft import LoraConfig

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(output_dir="./output"),
    train_dataset=dataset,
    peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
)
trainer.train()
trainer.save_model("./output")
# Only adapter weights saved: ./output/adapter_model.safetensors
#                              ./output/adapter_config.json

Completion logging to Hub:

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        output_dir="./output",
        log_completions=True,
        log_completions_hub_repo="my-org/grpo-training-logs",
    ),
    train_dataset=dataset,
)
# During training, completion logs are saved as Parquet files
# and periodically uploaded to the Hub dataset repository

Related Pages

Principle:Huggingface_Trl_GRPO_Model_Saving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment