Implementation:Huggingface Trl AutoModelForSequenceClassification From Pretrained
| Property | Value |
|---|---|
| Implementation Name | AutoModelForSequenceClassification From Pretrained |
| Technology | Huggingface TRL, Transformers |
| Type | API Doc |
| Workflow | Reward Model Training |
| Principle | Principle:Huggingface_Trl_Reward_Sequence_Classifier_Loading |
Overview
Description
The reward model is loaded by the create_model_from_path utility function, which uses AutoModelForSequenceClassification as the architecture class. This produces a pretrained language model with a linear classification head configured for single-label regression (num_labels=1), outputting a scalar reward value per sequence. Within RewardTrainer.__init__, when the model argument is a string path, this function is called automatically.
Usage
Model loading is triggered internally when a string model path is passed to RewardTrainer. It can also be used directly via create_model_from_path for custom setups.
Code Reference
Source Location
- create_model_from_path:
trl/trainer/utils.pylines 1133-1169 - RewardTrainer model loading:
trl/trainer/reward_trainer.pylines 308-313
Signature
def create_model_from_path(
model_id: str,
architecture: _BaseAutoModelClass | None = None,
**kwargs
) -> PreTrainedModel:
"""
Create a model from a given path using the specified initialization arguments.
Args:
model_id: Path to the model (local or Hub identifier).
architecture: Model architecture class (e.g., AutoModelForSequenceClassification).
**kwargs: Keyword arguments passed to from_pretrained.
Returns:
The instantiated PreTrainedModel.
"""
# In RewardTrainer.__init__ (reward_trainer.py L308-313):
if isinstance(model, str):
model_init_kwargs = args.model_init_kwargs or {}
if args.distributed_state.distributed_type in ["MULTI_GPU", "DEEPSPEED"]:
model_init_kwargs["device_map"] = None
model = create_model_from_path(model, AutoModelForSequenceClassification, **model_init_kwargs)
Import
from transformers import AutoModelForSequenceClassification
from trl.trainer.utils import create_model_from_path
Key Configuration
The critical configuration is num_labels=1 which is set internally by the AutoModelForSequenceClassification architecture when the model is loaded for reward scoring. This configures the model for scalar reward output rather than multi-class classification.
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
| model_id | str | (required) | Model path (local directory or Huggingface Hub model ID) |
| architecture | _BaseAutoModelClass or None | None | Architecture class; set to AutoModelForSequenceClassification for reward models |
| dtype | str or torch.dtype | "float32" | Data type for model weights; accepts "auto", "bfloat16", "float16", "float32" |
| device_map | str or dict | "auto" | Device placement strategy; set to None for distributed training |
Outputs
| Output | Type | Description |
|---|---|---|
| model | PreTrainedModel | Loaded model with classification head (score layer) producing scalar rewards |
Usage Examples
Direct Loading
from transformers import AutoModelForSequenceClassification
from trl.trainer.utils import create_model_from_path
# Load a pretrained model as a reward model
reward_model = create_model_from_path(
"Qwen/Qwen2.5-0.5B-Instruct",
architecture=AutoModelForSequenceClassification,
num_labels=1,
dtype="bfloat16",
)
Via RewardTrainer (Automatic Loading)
from trl import RewardTrainer, RewardConfig
# When a string is passed, RewardTrainer loads the model automatically
trainer = RewardTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
args=RewardConfig(output_dir="reward-output"),
train_dataset=dataset,
)
# The model is now an AutoModelForSequenceClassification with num_labels=1