Implementation:Allenai Open instruct AutoModelForSequenceClassification From Pretrained
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement Learning from Human Feedback, Reward Modeling, Natural Language Processing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for initializing a reward model from a pre-trained language model checkpoint provided by HuggingFace Transformers.
Description
This implementation uses AutoModelForSequenceClassification.from_pretrained from the HuggingFace Transformers library to load a pre-trained language model and automatically attach a single-output linear classification head (score head) on top of the transformer backbone. By specifying num_labels=1, the model is configured to output a single scalar reward value per sequence rather than multi-class logits.
Within Open Instruct's reward_modeling.py, the initialization also handles:
- Custom model registration: OLMo2 and OLMoE architectures are registered with the AutoModelForSequenceClassification class before loading, enabling support for Allen AI's custom model architectures.
- Token embedding resizing: If the tokenizer vocabulary exceeds the model's embedding size, the embeddings are resized with padding to a multiple of 8 for tensor core efficiency.
- Gradient checkpointing: Optionally enabled to reduce memory usage during training at the cost of additional computation.
- Dropout disabling: All dropout layers are set to
p=0following the recommendation in Stiennon et al. (2020) for stable reward model training. - Score head initialization: The score head weights are initialized with a controlled standard deviation via the
layer_initfunction.
Usage
Import and use this pattern when setting up a reward model training pipeline. This is the entry point for converting any HuggingFace-compatible pre-trained model into a reward model.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/reward_modeling.py, lines 256-310 (withinmain())
Signature
# Core model loading call (from HuggingFace Transformers)
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
model_config.model_name_or_path,
revision=model_config.model_revision,
num_labels=1,
)
Import
from transformers import AutoModelForSequenceClassification, PreTrainedModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | Path to the pre-trained model checkpoint or HuggingFace model hub identifier (e.g., an SFT checkpoint such as allenai/tulu-2-7b).
|
| revision | str or None | No | Specific model version to use (branch name, tag, or commit hash). Defaults to the main branch. |
| num_labels | int | Yes | Number of output labels for the classification head. Must be set to 1 for reward modeling to produce a scalar score. |
| gradient_checkpointing | bool | No | Whether to enable gradient checkpointing on the loaded model to reduce memory consumption. Configured via model_config.gradient_checkpointing.
|
Outputs
| Name | Type | Description |
|---|---|---|
| model | PreTrainedModel | A sequence classification model with the transformer backbone weights loaded from the checkpoint and a randomly initialized score head (model.score) that projects from hidden dimension to a single scalar.
|
Usage Examples
Basic Usage
from transformers import AutoModelForSequenceClassification
# Load a pre-trained model as a reward model
model = AutoModelForSequenceClassification.from_pretrained(
"allenai/tulu-2-7b",
num_labels=1,
)
# model.score is a Linear(hidden_size, 1) layer
Full Initialization Pattern (from Open Instruct)
import numpy as np
from transformers import AutoModelForSequenceClassification, PreTrainedModel
from open_instruct.model_utils import disable_dropout_in_model
from open_instruct.reward_modeling import layer_init
# Load the model with a single-output score head
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
model_config.model_name_or_path,
revision=model_config.model_revision,
num_labels=1,
)
# Resize embeddings if tokenizer has more tokens
if len(tokenizer) > model.get_input_embeddings().weight.shape[0]:
model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
# Enable gradient checkpointing for memory efficiency
if model_config.gradient_checkpointing:
model.gradient_checkpointing_enable()
# Disable dropout for stable reward predictions
disable_dropout_in_model(model)
# Initialize score head with small standard deviation
layer_init(model.score, std=1 / np.sqrt(model.config.hidden_size + 1))
Dependencies
| Package | Module | Purpose |
|---|---|---|
| transformers | AutoModelForSequenceClassification | Auto-detection and loading of sequence classification models from checkpoints |
| transformers | AutoConfig | Automatic configuration detection (used internally by from_pretrained)
|
| transformers | PreTrainedModel | Base class for the returned model |
| deepspeed | deepspeed.zero.GatheredParameters | Used for gathering embedding parameters in ZeRO-3 to check embedding size |
| torch | torch.nn | Neural network modules (the score head is a nn.Linear layer)
|