Implementation:Huggingface Trl GRPOTrainer Save Model
Appearance
| Property | Value |
|---|---|
| Implementation Name | GRPOTrainer Save Model |
| Library | Huggingface TRL |
| Type | External Tool Doc |
| Source Files | trl/trainer/grpo_trainer.py (L2213-2219), trl/scripts/grpo.py (L168-172)
|
| Import | from trl import GRPOTrainer
|
Overview
Description
The GRPOTrainer inherits model saving functionality from the Hugging Face Trainer class, with one key override: the _save_checkpoint method generates a model card before delegating to the parent implementation. The GRPO script orchestrates the full save-and-publish workflow after training completes.
Usage
from trl import GRPOTrainer, GRPOConfig
trainer = GRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
reward_funcs=accuracy_reward,
args=GRPOConfig(output_dir="./output", push_to_hub=True, hub_model_id="my-org/my-grpo-model"),
train_dataset=dataset,
)
# Training
trainer.train()
# Save final model
trainer.save_model("./output")
# Push to Hub (optional)
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")
Code Reference
Source Location
| Method | File | Lines |
|---|---|---|
_save_checkpoint |
trl/trainer/grpo_trainer.py |
L2213-2219 |
| Save and push in GRPO script | trl/scripts/grpo.py |
L168-172 |
save_model |
Inherited from transformers.Trainer |
-- |
push_to_hub |
Inherited from transformers.Trainer |
-- |
Signature
# GRPOTrainer override
def _save_checkpoint(self, model, trial):
"""
Creates a model card with the GRPO-specific metadata before
delegating to the parent Trainer._save_checkpoint.
The model card includes:
- Model name (from output_dir or hub_model_id)
- TRL and GRPO tags
- Paper citation for DeepSeekMath
"""
if self.args.hub_model_id is None:
model_name = Path(self.args.output_dir).name
else:
model_name = self.args.hub_model_id.split("/")[-1]
self.create_model_card(model_name=model_name)
super()._save_checkpoint(model, trial)
# Inherited methods (from transformers.Trainer)
def save_model(self, output_dir: str | None = None) -> None:
"""
Save model weights, tokenizer, and training arguments to the output directory.
For PEFT models, only adapter weights are saved.
"""
def push_to_hub(self, **kwargs) -> str:
"""
Upload the model, tokenizer, and model card to the Hugging Face Hub.
Returns the URL of the published model.
"""
GRPO script save-and-publish workflow:
# trl/scripts/grpo.py L167-173
# Train the model
trainer.train()
# Save and push to Hub
trainer.save_model(training_args.output_dir)
if training_args.push_to_hub:
trainer.push_to_hub(dataset_name=script_args.dataset_name)
Import
from trl import GRPOTrainer
I/O Contract
Inputs (save_model)
| Parameter | Type | Description |
|---|---|---|
output_dir |
None | Directory to save the model to. If None, uses args.output_dir.
|
Outputs (save_model)
| Output | Type | Description |
|---|---|---|
| Saved files | Files on disk | Model weights (model.safetensors or adapter files), tokenizer files, training_args.bin, and model card (README.md).
|
Inputs (push_to_hub)
| Parameter | Type | Description |
|---|---|---|
**kwargs |
Any | Keyword arguments forwarded to the Hub upload. Common: dataset_name for model card metadata.
|
Outputs (push_to_hub)
| Output | Type | Description |
|---|---|---|
| URL | str |
The URL of the model on the Hugging Face Hub (e.g., https://huggingface.co/my-org/my-model).
|
Usage Examples
Save locally after training:
trainer.train()
trainer.save_model("./my_grpo_model")
# Saves: ./my_grpo_model/model.safetensors
# ./my_grpo_model/tokenizer.json
# ./my_grpo_model/config.json
# ./my_grpo_model/README.md (model card)
Save and push to Hub:
trainer = GRPOTrainer(
model="Qwen/Qwen2.5-7B-Instruct",
reward_funcs=accuracy_reward,
args=GRPOConfig(
output_dir="./output",
push_to_hub=True,
hub_model_id="my-org/Qwen2.5-7B-GRPO-Math",
),
train_dataset=dataset,
)
trainer.train()
trainer.save_model("./output")
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")
# Model published at: https://huggingface.co/my-org/Qwen2.5-7B-GRPO-Math
PEFT adapter saving:
from peft import LoraConfig
trainer = GRPOTrainer(
model="Qwen/Qwen2.5-7B-Instruct",
reward_funcs=accuracy_reward,
args=GRPOConfig(output_dir="./output"),
train_dataset=dataset,
peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
)
trainer.train()
trainer.save_model("./output")
# Only adapter weights saved: ./output/adapter_model.safetensors
# ./output/adapter_config.json
Completion logging to Hub:
trainer = GRPOTrainer(
model="Qwen/Qwen2.5-7B-Instruct",
reward_funcs=accuracy_reward,
args=GRPOConfig(
output_dir="./output",
log_completions=True,
log_completions_hub_repo="my-org/grpo-training-logs",
),
train_dataset=dataset,
)
# During training, completion logs are saved as Parquet files
# and periodically uploaded to the Hub dataset repository
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment