Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Princeton nlp SimPO Trainer Save and Evaluate

From Leeroopedia


Knowledge Sources
Domains MLOps, Model_Management
Last Updated 2026-02-08 04:30 GMT

Overview

Wrapper documentation for HuggingFace Trainer's save_model and evaluate methods as used in the SimPO post-training workflow.

Description

The model saving and evaluation step uses inherited methods from transformers.Trainer (via SimPOTrainer). trainer.save_model() persists model weights and config to disk. trainer.evaluate() runs the SimPO-specific evaluation loop (overridden in SimPOTrainer to compute preference metrics). trainer.create_model_card() generates a model card. The SimPOTrainer also overrides push_to_hub() to add the "simpo" tag to the model card. After saving, the script restores model.config.use_cache = True for efficient inference with the saved model.

Usage

Call these methods on the SimPOTrainer instance after trainer.train() completes.

Code Reference

Source Location

  • Repository: SimPO
  • File: scripts/run_simpo.py (Lines 281-312)
  • File: scripts/simpo_trainer.py (Lines 885-893, push_to_hub override)

Signature

# Inherited from transformers.Trainer:
def save_model(self, output_dir: Optional[str] = None) -> None:
    """Save model weights, config, and tokenizer to output_dir."""

def evaluate(
    self,
    eval_dataset: Optional[Dataset] = None,
    ignore_keys: Optional[List[str]] = None,
    metric_key_prefix: str = "eval",
) -> Dict[str, float]:
    """Run evaluation and return metrics dict."""

def create_model_card(self, **kwargs) -> None:
    """Create a model card with training metadata."""

# SimPOTrainer override:
def push_to_hub(
    self,
    commit_message: Optional[str] = "End of training",
    blocking: bool = True,
    **kwargs,
) -> str:
    """Push model to Hub with 'simpo' tag."""

Import

# These are methods on the SimPOTrainer instance, not standalone imports
# Access via: trainer.save_model(), trainer.evaluate(), etc.

I/O Contract

Inputs

Name Type Required Description
trainer SimPOTrainer Yes Trained SimPOTrainer instance (after train() completes)
output_dir str Yes Directory to save model weights (from training_args.output_dir)
eval_dataset Dataset No Evaluation dataset (uses trainer's eval_dataset if not provided)

Outputs

Name Type Description
Saved model Files Model weights, config.json, tokenizer files at output_dir
Training metrics JSON file train_results.json with loss, samples count
Eval metrics Dict[str, float] eval_loss, eval_rewards/chosen, eval_rewards/rejected, eval_rewards/accuracies
Model card File README.md with training provenance

Usage Examples

Post-Training Save and Evaluate

# After training completes:
train_result = trainer.train(resume_from_checkpoint=checkpoint)
metrics = train_result.metrics
metrics["train_samples"] = len(raw_datasets["train"])
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()

# Save model
trainer.save_model(training_args.output_dir)

# Create model card (main process only)
if trainer.accelerator.is_main_process:
    trainer.create_model_card(
        finetuned_from=model_args.model_name_or_path,
        dataset=list(data_args.dataset_mixer.keys()),
        dataset_tags=list(data_args.dataset_mixer.keys()),
        tags=["alignment-handbook"],
    )
    # Restore KV cache for inference
    trainer.model.config.use_cache = True
    trainer.model.config.save_pretrained(training_args.output_dir)

# Optional evaluation
if training_args.do_eval:
    metrics = trainer.evaluate()
    metrics["eval_samples"] = len(raw_datasets["test"])
    trainer.log_metrics("eval", metrics)
    trainer.save_metrics("eval", metrics)

# Optional push to Hub
if training_args.push_to_hub:
    trainer.push_to_hub()

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment