Implementation:Allenai Open instruct OLMoE 0125 Dev Scripts

Knowledge Sources	Allenai_Open_instruct
Domains	Training, Hyperparameter_Search, Mixture_Of_Experts
Last Updated	2026-02-07 02:00 GMT

Overview

⚠️ ARCHIVED: This script is in docs/archived_dev_scripts/ and is a historical record. Do not execute directly; see Heuristic:Allenai_Open_instruct_Warning_Archived_Dev_Scripts.

Archived development script containing the full training pipeline commands for OLMoE (Mixture of Experts) model instruction tuning from January 2025.

Description

The olmoe_0125.sh script documents the experimental pipeline for developing OLMoE instruction-tuned models. Similar in structure to the OLMo 2 scripts, it adapts the post-training pipeline for Mixture of Experts architectures. It contains SFT sweeps with DeepSpeed ZeRO-2 across learning rates and batch sizes, DPO sweeps using dpo_tune_cache.py, reward model training, and PPO-based RL with ppo_vllm_thread_ray_gtrl.py. Key differences from the OLMo 2 scripts include OLMoE-specific checkpoint paths from /weka, lower batch sizes suited to MoE memory requirements, and the allenai/2025-1-olmoe-instruct-evals metadata dataset.

Usage

Reference this script when reproducing OLMoE instruction tuning or adapting training pipelines for Mixture of Experts models. It documents specific dataset mixes and hyperparameter choices for MoE architectures.

Code Reference

Source Location

Repository: Allenai_Open_instruct
File: docs/archived_dev_scripts/olmoe_0125.sh
Lines: 1-890

Signature

# OLMoE SFT sweep structure:
for LR in 2e-6 5e-6 1e-5; do
  python mason.py ... \
    --model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/... \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --learning_rate $LR
done

# DPO sweep using dpo_tune_cache.py
# Reward model training
# PPO with ppo_vllm_thread_ray_gtrl.py

Import

# Not imported. Run selected commands manually:
# bash docs/archived_dev_scripts/olmoe_0125.sh
# Or copy individual commands from the script.

I/O Contract

Inputs

Name	Type	Required	Description
OLMoE checkpoints	Model	Yes	OLMoE 8x1B base model weights from /weka
Training datasets	HF Dataset	Yes	SFT and preference datasets
Beaker cluster	Infrastructure	Yes	AI2 Jupiter cluster access

Outputs

Name	Type	Description
Trained models	Beaker Datasets	Fine-tuned OLMoE model checkpoints
Training logs	WandB	Training metrics and loss curves

Usage Examples

Running OLMoE SFT

# Example: OLMoE 1B SFT with specific hyperparameters
python mason.py \
  --cluster ai2/jupiter-cirrascale-2 \
  --budget ai2/oe-adapt \
  --num_nodes 1 --num_gpus 8 \
  -- python open_instruct/finetune.py \
    --model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/a0125/... \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --learning_rate 5e-6

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment