Principle:Huggingface Trl PEFT LoRA Configuration Reward

Property	Value
Principle Name	PEFT LoRA Configuration for Reward
Technology	Huggingface TRL, PEFT
Category	Parameter-Efficient Fine-Tuning
Workflow	Reward Model Training
Implementation	Implementation:Huggingface_Trl_Get_Peft_Config_Reward

Overview

Description

When training reward models with limited compute or when preserving the base model's capabilities is important, LoRA (Low-Rank Adaptation) provides a parameter-efficient alternative to full fine-tuning. However, applying LoRA to reward model training requires careful task-specific configuration that differs from standard causal language modeling LoRA setups.

The critical distinction is the task_type parameter: reward models must use SEQ_CLS (sequence classification) rather than CAUSAL_LM (causal language modeling). Additionally, the classification head (typically named "score") must be included in modules_to_save to ensure it remains fully trainable, since LoRA adapters only modify existing weight matrices and cannot train randomly initialized heads.

Usage

The get_peft_config utility function creates a LoraConfig from the ModelConfig dataclass fields. The resulting PeftConfig is passed to the RewardTrainer via the peft_config parameter, where it is applied using get_peft_model during trainer initialization.

Theoretical Basis

Task-Specific LoRA

LoRA decomposes weight updates into low-rank matrices:

W' = W + BA

where B has shape (d, r) and A has shape (r, k) with rank r much smaller than d and k. The task type determines how PEFT handles the model architecture:

SEQ_CLS: Configures LoRA for sequence classification models. PEFT expects the model to have a classification head and handles the forward pass accordingly, extracting the reward from the last token position.
CAUSAL_LM: Configures LoRA for autoregressive text generation. Using this task type for reward models would result in incorrect behavior because the model's output structure would be treated as next-token prediction logits rather than scalar rewards.

Modules to Save

The modules_to_save parameter specifies model components that should be fully fine-tuned (not adapted via LoRA). For reward models, the "score" head must be listed because:

The score head is randomly initialized and needs full gradient updates to learn meaningful reward predictions.
LoRA adapters modify existing pretrained weights via low-rank updates, but they cannot serve as a replacement for training a new head from scratch.
Without including "score" in modules_to_save, the reward head would remain at its random initialization and produce meaningless outputs.

SEQ_CLS vs CAUSAL_LM

Property	SEQ_CLS	CAUSAL_LM
Output	Single scalar per sequence	Logits per token
Loss	Pairwise preference (Bradley-Terry)	Next-token cross-entropy
Head	Linear projection to 1 label	LM head (vocabulary projection)
Use case	Reward models, classifiers	Text generation, SFT

Gradient Checkpointing with PEFT

When combining LoRA with gradient checkpointing (which is enabled by default in RewardConfig), TRL explicitly calls model.enable_input_require_grads() to ensure gradients flow correctly through the PEFT adapter layers. This addresses a known interaction issue between Transformers gradient checkpointing and PEFT adapter training.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment