Implementation:Allenai Open instruct OLMoE 0125 Dev Scripts
| Knowledge Sources | |
|---|---|
| Domains | Training, Hyperparameter_Search, Mixture_Of_Experts |
| Last Updated | 2026-02-07 02:00 GMT |
Overview
⚠️ ARCHIVED: This script is in docs/archived_dev_scripts/ and is a historical record. Do not execute directly; see Heuristic:Allenai_Open_instruct_Warning_Archived_Dev_Scripts.
Archived development script containing the full training pipeline commands for OLMoE (Mixture of Experts) model instruction tuning from January 2025.
Description
The olmoe_0125.sh script documents the experimental pipeline for developing OLMoE instruction-tuned models. Similar in structure to the OLMo 2 scripts, it adapts the post-training pipeline for Mixture of Experts architectures. It contains SFT sweeps with DeepSpeed ZeRO-2 across learning rates and batch sizes, DPO sweeps using dpo_tune_cache.py, reward model training, and PPO-based RL with ppo_vllm_thread_ray_gtrl.py. Key differences from the OLMo 2 scripts include OLMoE-specific checkpoint paths from /weka, lower batch sizes suited to MoE memory requirements, and the allenai/2025-1-olmoe-instruct-evals metadata dataset.
Usage
Reference this script when reproducing OLMoE instruction tuning or adapting training pipelines for Mixture of Experts models. It documents specific dataset mixes and hyperparameter choices for MoE architectures.
Code Reference
Source Location
- Repository: Allenai_Open_instruct
- File: docs/archived_dev_scripts/olmoe_0125.sh
- Lines: 1-890
Signature
# OLMoE SFT sweep structure:
for LR in 2e-6 5e-6 1e-5; do
python mason.py ... \
--model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/... \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 2 \
--learning_rate $LR
done
# DPO sweep using dpo_tune_cache.py
# Reward model training
# PPO with ppo_vllm_thread_ray_gtrl.py
Import
# Not imported. Run selected commands manually:
# bash docs/archived_dev_scripts/olmoe_0125.sh
# Or copy individual commands from the script.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| OLMoE checkpoints | Model | Yes | OLMoE 8x1B base model weights from /weka |
| Training datasets | HF Dataset | Yes | SFT and preference datasets |
| Beaker cluster | Infrastructure | Yes | AI2 Jupiter cluster access |
Outputs
| Name | Type | Description |
|---|---|---|
| Trained models | Beaker Datasets | Fine-tuned OLMoE model checkpoints |
| Training logs | WandB | Training metrics and loss curves |
Usage Examples
Running OLMoE SFT
# Example: OLMoE 1B SFT with specific hyperparameters
python mason.py \
--cluster ai2/jupiter-cirrascale-2 \
--budget ai2/oe-adapt \
--num_nodes 1 --num_gpus 8 \
-- python open_instruct/finetune.py \
--model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/a0125/... \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 2 \
--learning_rate 5e-6