Implementation:Allenai Open instruct OLMo2 1124 Dev Scripts
| Knowledge Sources | |
|---|---|
| Domains | Training, Hyperparameter_Search |
| Last Updated | 2026-02-07 02:00 GMT |
Overview
⚠️ ARCHIVED: This script is in docs/archived_dev_scripts/ and is a historical record. Do not execute directly; see Heuristic:Allenai_Open_instruct_Warning_Archived_Dev_Scripts.
Archived development script containing the full hyperparameter sweep commands used to develop OLMo 2 (November 2024) instruction-tuned models at 7B and 13B scales.
Description
The olmo2_1124.sh script is a historical record of the experimental pipeline used to develop the OLMo 2 1124 instruct models. It contains nested bash loops for comprehensive hyperparameter sweeps across: SFT (supervised finetuning) for 7B and 13B models, DPO (direct preference optimization) training, reward model training, and PPO-based reinforcement learning. The sweeps cover learning rates, epochs, seeds, and beta values. Jobs are submitted to AI2's Jupiter Beaker cluster using mason.py and the submit_finetune_job.py / submit_dpo_job.py scripts. The script includes both first-round (1203 prefix) and second-round (1206/1208/1210 prefix) experiments, showing iterative refinement.
Usage
Reference this script when reproducing OLMo 2 1124 instruction tuning experiments or when designing new hyperparameter sweeps. It demonstrates the complete SFT -> DPO -> RM -> RL training pipeline with concrete parameter ranges.
Code Reference
Source Location
- Repository: Allenai_Open_instruct
- File: docs/archived_dev_scripts/olmo2_1124.sh
- Lines: 1-894
Signature
# Hyperparameter sweep structure:
# 7B SFT sweep: learning rates, epochs, seeds
for LR in 2e-6 5e-6 1e-5 2e-5; do
for EPOCH in 1 2; do
python mason.py ... --learning_rate $LR --num_train_epochs $EPOCH
done
done
# 7B DPO sweep: learning rates, beta values
for LR in 1e-6 3e-6 5e-6; do
python mason.py ... --learning_rate $LR
done
# 13B SFT and DPO sweeps follow similar patterns
# Reward model training and PPO experiments included
Import
# Not imported. Run selected commands manually:
# bash docs/archived_dev_scripts/olmo2_1124.sh
# Or copy individual commands from the script.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Model checkpoints | HF Model | Yes | OLMo 2 7B/13B base model weights |
| Training datasets | HF Dataset | Yes | SFT and preference datasets |
| Beaker cluster | Infrastructure | Yes | AI2 Jupiter cluster access |
Outputs
| Name | Type | Description |
|---|---|---|
| Trained models | Beaker Datasets | Fine-tuned OLMo 2 model checkpoints |
| Training logs | WandB | Training metrics and loss curves |
| Evaluation results | Beaker Datasets | Benchmark evaluation scores |
Usage Examples
Running a Single SFT Experiment
# Example: OLMo 2 7B SFT with specific hyperparameters
python mason.py \
--cluster ai2/jupiter-cirrascale-2 \
--budget ai2/oe-adapt \
--num_nodes 2 --num_gpus 8 \
-- python open_instruct/finetune.py \
--model_name_or_path allenai/OLMo-2-1124-7B \
--learning_rate 5e-6 \
--num_train_epochs 2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16