Implementation:Huggingface Trl Get Peft Config Reward
Appearance
| Property | Value |
|---|---|
| Implementation Name | Get Peft Config Reward |
| Technology | Huggingface TRL, PEFT |
| Type | Wrapper Doc |
| Workflow | Reward Model Training |
| Principle | Principle:Huggingface_Trl_PEFT_LoRA_Configuration_Reward |
Overview
Description
The get_peft_config function creates a PEFT LoraConfig from the fields of a ModelConfig dataclass. For reward model training, the configuration must specify lora_task_type="SEQ_CLS" and include "score" in lora_modules_to_save to ensure the classification head is fully trainable. The resulting config is passed to RewardTrainer which applies it via get_peft_model during initialization.
Usage
This function is typically called in the reward training script to bridge the command-line/config-parsed ModelConfig with the PEFT configuration expected by RewardTrainer.
Code Reference
Source Location
- get_peft_config:
trl/trainer/utils.pylines 309-332 - PEFT application in RewardTrainer:
trl/trainer/reward_trainer.pylines 350-398
Signature
def get_peft_config(model_args: ModelConfig) -> "PeftConfig | None":
"""
Create a PEFT LoRA configuration from ModelConfig.
Returns None if model_args.use_peft is False.
Raises ValueError if PEFT library is not installed.
"""
if model_args.use_peft is False:
return None
peft_config = LoraConfig(
task_type=model_args.lora_task_type,
r=model_args.lora_r,
target_modules=model_args.lora_target_modules,
target_parameters=model_args.lora_target_parameters,
lora_alpha=model_args.lora_alpha,
lora_dropout=model_args.lora_dropout,
bias="none",
use_rslora=model_args.use_rslora,
use_dora=model_args.use_dora,
modules_to_save=model_args.lora_modules_to_save,
)
return peft_config
# In RewardTrainer.__init__ (reward_trainer.py L382-398):
if peft_config is not None:
model = get_peft_model(model, peft_config)
# Enable input gradients for gradient checkpointing + PEFT
if is_peft_model(model) and args.gradient_checkpointing:
model.enable_input_require_grads()
# QLoRA: Convert adapter weights to bf16
if getattr(model, "is_loaded_in_4bit", False) or getattr(model, "is_loaded_in_8bit", False):
for param in model.parameters():
if param.requires_grad:
param.data = param.data.to(torch.bfloat16)
Import
from trl.trainer.utils import get_peft_config
from trl import ModelConfig
from peft import LoraConfig, PeftConfig, get_peft_model
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| model_args | ModelConfig | (required) | Model configuration containing PEFT parameters |
| model_args.use_peft | bool | False | Whether to enable PEFT |
| model_args.lora_task_type | str | "CAUSAL_LM" | Task type; must be set to "SEQ_CLS" for reward models |
| model_args.lora_r | int | 16 | LoRA rank |
| model_args.lora_alpha | float | 16 | LoRA scaling factor |
| model_args.lora_dropout | float | 0.05 | Dropout probability for LoRA layers |
| model_args.lora_target_modules | list or None | None | Modules to apply LoRA adapters to |
| model_args.lora_modules_to_save | list or None | None | Modules to fully train; should include ["score"] for reward models |
Outputs
| Output | Type | Description |
|---|---|---|
| peft_config | PeftConfig or None | LoRA configuration object, or None if PEFT is disabled |
Usage Examples
Reward Model with LoRA
from trl import ModelConfig, RewardTrainer, RewardConfig
from trl.trainer.utils import get_peft_config
model_args = ModelConfig(
model_name_or_path="Qwen/Qwen2.5-0.5B-Instruct",
use_peft=True,
lora_task_type="SEQ_CLS",
lora_r=16,
lora_alpha=32,
lora_modules_to_save=["score"],
)
peft_config = get_peft_config(model_args)
trainer = RewardTrainer(
model=model_args.model_name_or_path,
args=RewardConfig(output_dir="reward-lora-output"),
train_dataset=dataset,
peft_config=peft_config,
)
trainer.train()
Command-Line Configuration
python trl/scripts/reward.py \
--model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
--dataset_name trl-lib/ultrafeedback_binarized \
--output_dir reward-lora-output \
--use_peft \
--lora_task_type SEQ_CLS \
--lora_r 16 \
--lora_alpha 32 \
--lora_modules_to_save score
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment