Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct AutoModelForSequenceClassification From Pretrained

From Leeroopedia


Knowledge Sources
Domains Reinforcement Learning from Human Feedback, Reward Modeling, Natural Language Processing
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for initializing a reward model from a pre-trained language model checkpoint provided by HuggingFace Transformers.

Description

This implementation uses AutoModelForSequenceClassification.from_pretrained from the HuggingFace Transformers library to load a pre-trained language model and automatically attach a single-output linear classification head (score head) on top of the transformer backbone. By specifying num_labels=1, the model is configured to output a single scalar reward value per sequence rather than multi-class logits.

Within Open Instruct's reward_modeling.py, the initialization also handles:

  • Custom model registration: OLMo2 and OLMoE architectures are registered with the AutoModelForSequenceClassification class before loading, enabling support for Allen AI's custom model architectures.
  • Token embedding resizing: If the tokenizer vocabulary exceeds the model's embedding size, the embeddings are resized with padding to a multiple of 8 for tensor core efficiency.
  • Gradient checkpointing: Optionally enabled to reduce memory usage during training at the cost of additional computation.
  • Dropout disabling: All dropout layers are set to p=0 following the recommendation in Stiennon et al. (2020) for stable reward model training.
  • Score head initialization: The score head weights are initialized with a controlled standard deviation via the layer_init function.

Usage

Import and use this pattern when setting up a reward model training pipeline. This is the entry point for converting any HuggingFace-compatible pre-trained model into a reward model.

Code Reference

Source Location

  • Repository: Open Instruct
  • File: open_instruct/reward_modeling.py, lines 256-310 (within main())

Signature

# Core model loading call (from HuggingFace Transformers)
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
    model_config.model_name_or_path,
    revision=model_config.model_revision,
    num_labels=1,
)

Import

from transformers import AutoModelForSequenceClassification, PreTrainedModel

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes Path to the pre-trained model checkpoint or HuggingFace model hub identifier (e.g., an SFT checkpoint such as allenai/tulu-2-7b).
revision str or None No Specific model version to use (branch name, tag, or commit hash). Defaults to the main branch.
num_labels int Yes Number of output labels for the classification head. Must be set to 1 for reward modeling to produce a scalar score.
gradient_checkpointing bool No Whether to enable gradient checkpointing on the loaded model to reduce memory consumption. Configured via model_config.gradient_checkpointing.

Outputs

Name Type Description
model PreTrainedModel A sequence classification model with the transformer backbone weights loaded from the checkpoint and a randomly initialized score head (model.score) that projects from hidden dimension to a single scalar.

Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification

# Load a pre-trained model as a reward model
model = AutoModelForSequenceClassification.from_pretrained(
    "allenai/tulu-2-7b",
    num_labels=1,
)
# model.score is a Linear(hidden_size, 1) layer

Full Initialization Pattern (from Open Instruct)

import numpy as np
from transformers import AutoModelForSequenceClassification, PreTrainedModel
from open_instruct.model_utils import disable_dropout_in_model
from open_instruct.reward_modeling import layer_init

# Load the model with a single-output score head
model: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(
    model_config.model_name_or_path,
    revision=model_config.model_revision,
    num_labels=1,
)

# Resize embeddings if tokenizer has more tokens
if len(tokenizer) > model.get_input_embeddings().weight.shape[0]:
    model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)

# Enable gradient checkpointing for memory efficiency
if model_config.gradient_checkpointing:
    model.gradient_checkpointing_enable()

# Disable dropout for stable reward predictions
disable_dropout_in_model(model)

# Initialize score head with small standard deviation
layer_init(model.score, std=1 / np.sqrt(model.config.hidden_size + 1))

Dependencies

Package Module Purpose
transformers AutoModelForSequenceClassification Auto-detection and loading of sequence classification models from checkpoints
transformers AutoConfig Automatic configuration detection (used internally by from_pretrained)
transformers PreTrainedModel Base class for the returned model
deepspeed deepspeed.zero.GatheredParameters Used for gathering embedding parameters in ZeRO-3 to check embedding size
torch torch.nn Neural network modules (the score head is a nn.Linear layer)

Related Pages

Implements Principle

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment