Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ContextualAI HALOs Label Main

From Leeroopedia


Knowledge Sources
Domains NLP, Reinforcement_Learning, Data_Engineering
Last Updated 2026-02-08 03:00 GMT

Overview

Concrete tool for scoring model completions and converting them to feedback data provided by the train.label module.

Description

The train/label.py module provides a complete labeling pipeline with three stages:

  1. Scoring — Either process_batch_with_reward_model() for local reward model inference or process_samples_with_api() for async OpenAI API calls
  2. Feedback conversionconvert_to_pairwise_feedback() or convert_to_binary_feedback() transforms scored samples into training-ready formats
  3. Output — Results streamed to JSON via StreamingJSONWriter

The reward model scoring path distributes samples across GPU processes via Accelerate, applies the chat template, runs inference, and gathers results. The API path uses async batch processing for throughput.

Usage

Run as accelerate launch -m train.label --reward_model_path /path/to/reward samples.json feedback.json --feedback_type pairwise for reward model labeling, or python -m train.label --api_type openai samples.json feedback.json --feedback_type binary for API labeling.

Code Reference

Source Location

  • Repository: ContextualAI/HALOs
  • File: train/label.py
  • Lines: L40-74 (process_batch_with_reward_model), L77-130 (process_samples_with_api), L133-156 (convert_to_binary_feedback), L177-228 (convert_to_pairwise_feedback), L231-317 (main)

Signature

def process_batch_with_reward_model(
    samples: List,
    reward_model: AutoModelForSequenceClassification,
    tokenizer: AutoTokenizer,
    accelerator: Accelerator
) -> List[Dict]:
    """Score samples using a local reward model, distributed across GPUs."""

async def process_samples_with_api(
    samples: List,
    client: openai.AsyncOpenAI,
    system_prompt: str,
    label_prompt: str,
    model: str,
    batch_size: int = 10
) -> List[Dict]:
    """Score samples using an external LLM API."""

def convert_to_binary_feedback(
    samples: List[Dict],
    threshold: Union[str, float] = 0
) -> List[Dict]:
    """Convert scored samples to binary (desirable/undesirable) feedback.
    threshold: 'mean', 'median', or numeric value.
    """

def convert_to_pairwise_feedback(
    samples: List[Dict],
    seed: int,
    mode: str = 'random',
    threshold: float = 0
) -> List[Dict]:
    """Convert scored samples to pairwise preference feedback.
    mode: 'random', 'max', or 'min'.
    """

async def main(args: argparse.Namespace) -> None:
    """Main labeling pipeline: load samples, score, convert, write."""

Import

from train.label import (
    process_batch_with_reward_model,
    process_samples_with_api,
    convert_to_pairwise_feedback,
    convert_to_binary_feedback,
)
# Or run as CLI:
# accelerate launch -m train.label --reward_model_path /path samples.json output.json

I/O Contract

Inputs

Name Type Required Description
samples_path str Yes Path to JSON file of samples from train.sample
output_path str Yes Path for output feedback JSON
--reward_model_path str Conditional Path to reward model (mutually exclusive with --api_type)
--api_type str Conditional API provider ('openai') (mutually exclusive with --reward_model_path)
--feedback_type str No 'binary', 'pairwise', or None for scalar (default: None)
--feedback_mode str No Pairing mode: 'random', 'max', 'min' (default: 'max')
--threshold str No Feedback threshold: 'median', 'mean', or numeric (default: 'median')
--batch_size int No Batch size for processing (default: 16)

Outputs

Name Type Description
Pairwise feedback JSON List of dicts: prompt_id, prompt, output_A, output_B, label, reward_A, reward_B, reward_difference, type='pairwise_feedback'
Binary feedback JSON List of dicts: prompt_id, prompt, output, label (0/1), reward, type='binary_feedback'
Scalar feedback JSON List of dicts: original sample fields plus reward score, type='scalar_feedback'

Usage Examples

Reward Model Labeling (Pairwise)

accelerate launch \
    --config_file accelerate_config/fsdp_4gpu.yaml \
    --main_process_port 29500 \
    -m train.label \
    --reward_model_path /models/llama3-8B-bt/FINAL \
    --feedback_type pairwise \
    --feedback_mode max \
    round1_samples.json round1_feedback.json

API Labeling (Binary)

python -m train.label \
    --api_type openai \
    --api_key $OPENAI_API_KEY \
    --api_model gpt-4.1-mini \
    --label_prompt "Rate this response quality from 0 to 10. End with 'Final Score: X'" \
    --feedback_type binary \
    --threshold median \
    round1_samples.json round1_feedback.json

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment