Implementation:Allenai Open instruct AutoModelForSequenceClassification From Pretrained

Knowledge Sources	Open Instruct HuggingFace Transformers
Domains	Reinforcement Learning from Human Feedback, Reward Modeling, Natural Language Processing
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for initializing a reward model from a pre-trained language model checkpoint provided by HuggingFace Transformers.

Description

This implementation uses AutoModelForSequenceClassification.from_pretrained from the HuggingFace Transformers library to load a pre-trained language model and automatically attach a single-output linear classification head (score head) on top of the transformer backbone. By specifying num_labels=1, the model is configured to output a single scalar reward value per sequence rather than multi-class logits.

Within Open Instruct's reward_modeling.py, the initialization also handles:

Custom model registration: OLMo2 and OLMoE architectures are registered with the AutoModelForSequenceClassification class before loading, enabling support for Allen AI's custom model architectures.
Token embedding resizing: If the tokenizer vocabulary exceeds the model's embedding size, the embeddings are resized with padding to a multiple of 8 for tensor core efficiency.
Gradient checkpointing: Optionally enabled to reduce memory usage during training at the cost of additional computation.
Dropout disabling: All dropout layers are set to p=0 following the recommendation in Stiennon et al. (2020) for stable reward model training.
Score head initialization: The score head weights are initialized with a controlled standard deviation via the layer_init function.

Usage

Import and use this pattern when setting up a reward model training pipeline. This is the entry point for converting any HuggingFace-compatible pre-trained model into a reward model.

Code Reference

Source Location

Repository: Open Instruct
File: open_instruct/reward_modeling.py, lines 256-310 (within main())

Signature

# Core model loading call (from HuggingFace Transformers)
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
    model_config.model_name_or_path,
    revision=model_config.model_revision,
    num_labels=1,
)

Import

from transformers import AutoModelForSequenceClassification, PreTrainedModel

I/O Contract

Inputs

Name	Type	Required	Description
model_name_or_path	str	Yes	Path to the pre-trained model checkpoint or HuggingFace model hub identifier (e.g., an SFT checkpoint such as `allenai/tulu-2-7b`).
revision	str or None	No	Specific model version to use (branch name, tag, or commit hash). Defaults to the main branch.
num_labels	int	Yes	Number of output labels for the classification head. Must be set to 1 for reward modeling to produce a scalar score.
gradient_checkpointing	bool	No	Whether to enable gradient checkpointing on the loaded model to reduce memory consumption. Configured via `model_config.gradient_checkpointing`.

Outputs

Name	Type	Description
model	PreTrainedModel	A sequence classification model with the transformer backbone weights loaded from the checkpoint and a randomly initialized score head (`model.score`) that projects from hidden dimension to a single scalar.

Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification

# Load a pre-trained model as a reward model
model = AutoModelForSequenceClassification.from_pretrained(
    "allenai/tulu-2-7b",
    num_labels=1,
)
# model.score is a Linear(hidden_size, 1) layer

Full Initialization Pattern (from Open Instruct)

import numpy as np
from transformers import AutoModelForSequenceClassification, PreTrainedModel
from open_instruct.model_utils import disable_dropout_in_model
from open_instruct.reward_modeling import layer_init

# Load the model with a single-output score head
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
    model_config.model_name_or_path,
    revision=model_config.model_revision,
    num_labels=1,
)

# Resize embeddings if tokenizer has more tokens
if len(tokenizer) > model.get_input_embeddings().weight.shape[0]:
    model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)

# Enable gradient checkpointing for memory efficiency
if model_config.gradient_checkpointing:
    model.gradient_checkpointing_enable()

# Disable dropout for stable reward predictions
disable_dropout_in_model(model)

# Initialize score head with small standard deviation
layer_init(model.score, std=1 / np.sqrt(model.config.hidden_size + 1))

Dependencies

Package	Module	Purpose
transformers	AutoModelForSequenceClassification	Auto-detection and loading of sequence classification models from checkpoints
transformers	AutoConfig	Automatic configuration detection (used internally by `from_pretrained`)
transformers	PreTrainedModel	Base class for the returned model
deepspeed	deepspeed.zero.GatheredParameters	Used for gathering embedding parameters in ZeRO-3 to check embedding size
torch	torch.nn	Neural network modules (the score head is a `nn.Linear` layer)

Related Pages

Implements Principle

Principle:Allenai_Open_instruct_Reward_Model_Initialization

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment