Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Trl GRPOTrainer Save Model

From Leeroopedia


Property Value
Implementation Name GRPOTrainer Save Model
Library Huggingface TRL
Type External Tool Doc
Source Files trl/trainer/grpo_trainer.py (L2213-2219), trl/scripts/grpo.py (L168-172)
Import from trl import GRPOTrainer

Overview

Description

The GRPOTrainer inherits model saving functionality from the Hugging Face Trainer class, with one key override: the _save_checkpoint method generates a model card before delegating to the parent implementation. The GRPO script orchestrates the full save-and-publish workflow after training completes.

Usage

from trl import GRPOTrainer, GRPOConfig

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(output_dir="./output", push_to_hub=True, hub_model_id="my-org/my-grpo-model"),
    train_dataset=dataset,
)

# Training
trainer.train()

# Save final model
trainer.save_model("./output")

# Push to Hub (optional)
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")

Code Reference

Source Location

Method File Lines
_save_checkpoint trl/trainer/grpo_trainer.py L2213-2219
Save and push in GRPO script trl/scripts/grpo.py L168-172
save_model Inherited from transformers.Trainer --
push_to_hub Inherited from transformers.Trainer --

Signature

# GRPOTrainer override
def _save_checkpoint(self, model, trial):
    """
    Creates a model card with the GRPO-specific metadata before
    delegating to the parent Trainer._save_checkpoint.

    The model card includes:
    - Model name (from output_dir or hub_model_id)
    - TRL and GRPO tags
    - Paper citation for DeepSeekMath
    """
    if self.args.hub_model_id is None:
        model_name = Path(self.args.output_dir).name
    else:
        model_name = self.args.hub_model_id.split("/")[-1]
    self.create_model_card(model_name=model_name)
    super()._save_checkpoint(model, trial)
# Inherited methods (from transformers.Trainer)
def save_model(self, output_dir: str | None = None) -> None:
    """
    Save model weights, tokenizer, and training arguments to the output directory.
    For PEFT models, only adapter weights are saved.
    """

def push_to_hub(self, **kwargs) -> str:
    """
    Upload the model, tokenizer, and model card to the Hugging Face Hub.
    Returns the URL of the published model.
    """

GRPO script save-and-publish workflow:

# trl/scripts/grpo.py L167-173
# Train the model
trainer.train()

# Save and push to Hub
trainer.save_model(training_args.output_dir)

if training_args.push_to_hub:
    trainer.push_to_hub(dataset_name=script_args.dataset_name)

Import

from trl import GRPOTrainer

I/O Contract

Inputs (save_model)

Parameter Type Description
output_dir None Directory to save the model to. If None, uses args.output_dir.

Outputs (save_model)

Output Type Description
Saved files Files on disk Model weights (model.safetensors or adapter files), tokenizer files, training_args.bin, and model card (README.md).

Inputs (push_to_hub)

Parameter Type Description
**kwargs Any Keyword arguments forwarded to the Hub upload. Common: dataset_name for model card metadata.

Outputs (push_to_hub)

Output Type Description
URL str The URL of the model on the Hugging Face Hub (e.g., https://huggingface.co/my-org/my-model).

Usage Examples

Save locally after training:

trainer.train()
trainer.save_model("./my_grpo_model")
# Saves: ./my_grpo_model/model.safetensors
#         ./my_grpo_model/tokenizer.json
#         ./my_grpo_model/config.json
#         ./my_grpo_model/README.md (model card)

Save and push to Hub:

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        output_dir="./output",
        push_to_hub=True,
        hub_model_id="my-org/Qwen2.5-7B-GRPO-Math",
    ),
    train_dataset=dataset,
)
trainer.train()
trainer.save_model("./output")
trainer.push_to_hub(dataset_name="trl-lib/DeepMath-103K")
# Model published at: https://huggingface.co/my-org/Qwen2.5-7B-GRPO-Math

PEFT adapter saving:

from peft import LoraConfig

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(output_dir="./output"),
    train_dataset=dataset,
    peft_config=LoraConfig(r=16, target_modules=["q_proj", "v_proj"]),
)
trainer.train()
trainer.save_model("./output")
# Only adapter weights saved: ./output/adapter_model.safetensors
#                              ./output/adapter_config.json

Completion logging to Hub:

trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-7B-Instruct",
    reward_funcs=accuracy_reward,
    args=GRPOConfig(
        output_dir="./output",
        log_completions=True,
        log_completions_hub_repo="my-org/grpo-training-logs",
    ),
    train_dataset=dataset,
)
# During training, completion logs are saved as Parquet files
# and periodically uploaded to the Hub dataset repository

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment