Implementation:ContextualAI HALOs AutoModelForBradleyTerry From Pretrained

Knowledge Sources	ContextualAI HALOs HuggingFace AutoModelForSequenceClassification
Domains	Deep_Learning, NLP, Reinforcement_Learning
Last Updated	2026-02-08 03:00 GMT

Overview

Concrete tool for initializing a binary classification reward model provided by the AutoModelForBradleyTerry wrapper class.

Description

AutoModelForBradleyTerry is a wrapper around HuggingFace's AutoModelForSequenceClassification that enforces num_labels=2 for binary classification. It overrides from_pretrained() to force the binary classification head regardless of the config, and save_pretrained() to maintain this configuration when saving. It also ensures the padding token is correctly configured.

Usage

Used internally by BradleyTerryTrainer as the policy_hf_model_class. The model is loaded via Hydra config with loss=bradley-terry.

Code Reference

Source Location

Repository: ContextualAI/HALOs
File: train/models.py
Lines: L553-618

Signature

class AutoModelForBradleyTerry(AutoModelForSequenceClassification):
    """Wrapper ensuring binary classification (num_labels=2)."""

    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: Union[str, PreTrainedModel],
        *model_args,
        **kwargs
    ) -> PreTrainedModel:
        """Load pretrained model with forced num_labels=2.

        Args:
            pretrained_model_name_or_path: HuggingFace model ID or local path
            *model_args: Additional positional args for __init__
            **kwargs: Additional keyword args (num_labels forced to 2)

        Returns:
            PreTrainedModel with binary classification head
        """

    def save_pretrained(
        self,
        save_directory: str,
        is_main_process: bool = True,
        state_dict: Optional[dict] = None,
        save_function: callable = torch.save,
        **kwargs
    ):
        """Save with num_labels=2 and pad_token_id preserved."""

Import

from train.models import AutoModelForBradleyTerry

model = AutoModelForBradleyTerry.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16,
)

I/O Contract

Inputs

Name	Type	Required	Description
pretrained_model_name_or_path	str	Yes	HuggingFace model ID or local checkpoint path
torch_dtype	torch.dtype	No	Model precision (e.g., torch.bfloat16)
attn_implementation	str	No	Attention implementation ('flash_attention_2', 'eager')

Outputs

Name	Type	Description
model	PreTrainedModel	Sequence classification model with 2-label head
model.config.num_labels	int	Always 2
model.config.pad_token_id	int	Set to eos_token_id if not configured

Usage Examples

Loading a Reward Model

from train.models import AutoModelForBradleyTerry
import torch

# Initialize Bradley-Terry model from a pre-trained LLM
model = AutoModelForBradleyTerry.from_pretrained(
    "meta-llama/Meta-Llama-3-8B",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

# The model now has a 2-class classification head
print(model.config.num_labels)  # 2

Loading from a Trained Checkpoint

# Load a previously trained reward model
model = AutoModelForBradleyTerry.from_pretrained(
    "/models/llama3-8B-bt/FINAL",
    torch_dtype=torch.bfloat16,
)

Related Pages

Implements Principle

Principle:ContextualAI_HALOs_Reward_Model_Configuration

Requires Environment

Environment:ContextualAI_HALOs_CUDA_12_1_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment