Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct OLMo2 1124 Dev Scripts

From Leeroopedia


Knowledge Sources
Domains Training, Hyperparameter_Search
Last Updated 2026-02-07 02:00 GMT

Overview

⚠️ ARCHIVED: This script is in docs/archived_dev_scripts/ and is a historical record. Do not execute directly; see Heuristic:Allenai_Open_instruct_Warning_Archived_Dev_Scripts.

Archived development script containing the full hyperparameter sweep commands used to develop OLMo 2 (November 2024) instruction-tuned models at 7B and 13B scales.

Description

The olmo2_1124.sh script is a historical record of the experimental pipeline used to develop the OLMo 2 1124 instruct models. It contains nested bash loops for comprehensive hyperparameter sweeps across: SFT (supervised finetuning) for 7B and 13B models, DPO (direct preference optimization) training, reward model training, and PPO-based reinforcement learning. The sweeps cover learning rates, epochs, seeds, and beta values. Jobs are submitted to AI2's Jupiter Beaker cluster using mason.py and the submit_finetune_job.py / submit_dpo_job.py scripts. The script includes both first-round (1203 prefix) and second-round (1206/1208/1210 prefix) experiments, showing iterative refinement.

Usage

Reference this script when reproducing OLMo 2 1124 instruction tuning experiments or when designing new hyperparameter sweeps. It demonstrates the complete SFT -> DPO -> RM -> RL training pipeline with concrete parameter ranges.

Code Reference

Source Location

Signature

# Hyperparameter sweep structure:
# 7B SFT sweep: learning rates, epochs, seeds
for LR in 2e-6 5e-6 1e-5 2e-5; do
  for EPOCH in 1 2; do
    python mason.py ... --learning_rate $LR --num_train_epochs $EPOCH
  done
done

# 7B DPO sweep: learning rates, beta values
for LR in 1e-6 3e-6 5e-6; do
  python mason.py ... --learning_rate $LR
done

# 13B SFT and DPO sweeps follow similar patterns
# Reward model training and PPO experiments included

Import

# Not imported. Run selected commands manually:
# bash docs/archived_dev_scripts/olmo2_1124.sh
# Or copy individual commands from the script.

I/O Contract

Inputs

Name Type Required Description
Model checkpoints HF Model Yes OLMo 2 7B/13B base model weights
Training datasets HF Dataset Yes SFT and preference datasets
Beaker cluster Infrastructure Yes AI2 Jupiter cluster access

Outputs

Name Type Description
Trained models Beaker Datasets Fine-tuned OLMo 2 model checkpoints
Training logs WandB Training metrics and loss curves
Evaluation results Beaker Datasets Benchmark evaluation scores

Usage Examples

Running a Single SFT Experiment

# Example: OLMo 2 7B SFT with specific hyperparameters
python mason.py \
  --cluster ai2/jupiter-cirrascale-2 \
  --budget ai2/oe-adapt \
  --num_nodes 2 --num_gpus 8 \
  -- python open_instruct/finetune.py \
    --model_name_or_path allenai/OLMo-2-1124-7B \
    --learning_rate 5e-6 \
    --num_train_epochs 2 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment