Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Trl Get Peft Config Reward

From Leeroopedia


Property Value
Implementation Name Get Peft Config Reward
Technology Huggingface TRL, PEFT
Type Wrapper Doc
Workflow Reward Model Training
Principle Principle:Huggingface_Trl_PEFT_LoRA_Configuration_Reward

Overview

Description

The get_peft_config function creates a PEFT LoraConfig from the fields of a ModelConfig dataclass. For reward model training, the configuration must specify lora_task_type="SEQ_CLS" and include "score" in lora_modules_to_save to ensure the classification head is fully trainable. The resulting config is passed to RewardTrainer which applies it via get_peft_model during initialization.

Usage

This function is typically called in the reward training script to bridge the command-line/config-parsed ModelConfig with the PEFT configuration expected by RewardTrainer.

Code Reference

Source Location

  • get_peft_config: trl/trainer/utils.py lines 309-332
  • PEFT application in RewardTrainer: trl/trainer/reward_trainer.py lines 350-398

Signature

def get_peft_config(model_args: ModelConfig) -> "PeftConfig | None":
    """
    Create a PEFT LoRA configuration from ModelConfig.

    Returns None if model_args.use_peft is False.
    Raises ValueError if PEFT library is not installed.
    """
    if model_args.use_peft is False:
        return None

    peft_config = LoraConfig(
        task_type=model_args.lora_task_type,
        r=model_args.lora_r,
        target_modules=model_args.lora_target_modules,
        target_parameters=model_args.lora_target_parameters,
        lora_alpha=model_args.lora_alpha,
        lora_dropout=model_args.lora_dropout,
        bias="none",
        use_rslora=model_args.use_rslora,
        use_dora=model_args.use_dora,
        modules_to_save=model_args.lora_modules_to_save,
    )

    return peft_config
# In RewardTrainer.__init__ (reward_trainer.py L382-398):
if peft_config is not None:
    model = get_peft_model(model, peft_config)

# Enable input gradients for gradient checkpointing + PEFT
if is_peft_model(model) and args.gradient_checkpointing:
    model.enable_input_require_grads()

# QLoRA: Convert adapter weights to bf16
if getattr(model, "is_loaded_in_4bit", False) or getattr(model, "is_loaded_in_8bit", False):
    for param in model.parameters():
        if param.requires_grad:
            param.data = param.data.to(torch.bfloat16)

Import

from trl.trainer.utils import get_peft_config
from trl import ModelConfig
from peft import LoraConfig, PeftConfig, get_peft_model

I/O Contract

Inputs

Parameter Type Default Description
model_args ModelConfig (required) Model configuration containing PEFT parameters
model_args.use_peft bool False Whether to enable PEFT
model_args.lora_task_type str "CAUSAL_LM" Task type; must be set to "SEQ_CLS" for reward models
model_args.lora_r int 16 LoRA rank
model_args.lora_alpha float 16 LoRA scaling factor
model_args.lora_dropout float 0.05 Dropout probability for LoRA layers
model_args.lora_target_modules list or None None Modules to apply LoRA adapters to
model_args.lora_modules_to_save list or None None Modules to fully train; should include ["score"] for reward models

Outputs

Output Type Description
peft_config PeftConfig or None LoRA configuration object, or None if PEFT is disabled

Usage Examples

Reward Model with LoRA

from trl import ModelConfig, RewardTrainer, RewardConfig
from trl.trainer.utils import get_peft_config

model_args = ModelConfig(
    model_name_or_path="Qwen/Qwen2.5-0.5B-Instruct",
    use_peft=True,
    lora_task_type="SEQ_CLS",
    lora_r=16,
    lora_alpha=32,
    lora_modules_to_save=["score"],
)

peft_config = get_peft_config(model_args)

trainer = RewardTrainer(
    model=model_args.model_name_or_path,
    args=RewardConfig(output_dir="reward-lora-output"),
    train_dataset=dataset,
    peft_config=peft_config,
)
trainer.train()

Command-Line Configuration

python trl/scripts/reward.py \
    --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
    --dataset_name trl-lib/ultrafeedback_binarized \
    --output_dir reward-lora-output \
    --use_peft \
    --lora_task_type SEQ_CLS \
    --lora_r 16 \
    --lora_alpha 32 \
    --lora_modules_to_save score

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment