Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct OLMoE 0125 Dev Scripts

From Leeroopedia


Knowledge Sources
Domains Training, Hyperparameter_Search, Mixture_Of_Experts
Last Updated 2026-02-07 02:00 GMT

Overview

⚠️ ARCHIVED: This script is in docs/archived_dev_scripts/ and is a historical record. Do not execute directly; see Heuristic:Allenai_Open_instruct_Warning_Archived_Dev_Scripts.

Archived development script containing the full training pipeline commands for OLMoE (Mixture of Experts) model instruction tuning from January 2025.

Description

The olmoe_0125.sh script documents the experimental pipeline for developing OLMoE instruction-tuned models. Similar in structure to the OLMo 2 scripts, it adapts the post-training pipeline for Mixture of Experts architectures. It contains SFT sweeps with DeepSpeed ZeRO-2 across learning rates and batch sizes, DPO sweeps using dpo_tune_cache.py, reward model training, and PPO-based RL with ppo_vllm_thread_ray_gtrl.py. Key differences from the OLMo 2 scripts include OLMoE-specific checkpoint paths from /weka, lower batch sizes suited to MoE memory requirements, and the allenai/2025-1-olmoe-instruct-evals metadata dataset.

Usage

Reference this script when reproducing OLMoE instruction tuning or adapting training pipelines for Mixture of Experts models. It documents specific dataset mixes and hyperparameter choices for MoE architectures.

Code Reference

Source Location

Signature

# OLMoE SFT sweep structure:
for LR in 2e-6 5e-6 1e-5; do
  python mason.py ... \
    --model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/... \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --learning_rate $LR
done

# DPO sweep using dpo_tune_cache.py
# Reward model training
# PPO with ppo_vllm_thread_ray_gtrl.py

Import

# Not imported. Run selected commands manually:
# bash docs/archived_dev_scripts/olmoe_0125.sh
# Or copy individual commands from the script.

I/O Contract

Inputs

Name Type Required Description
OLMoE checkpoints Model Yes OLMoE 8x1B base model weights from /weka
Training datasets HF Dataset Yes SFT and preference datasets
Beaker cluster Infrastructure Yes AI2 Jupiter cluster access

Outputs

Name Type Description
Trained models Beaker Datasets Fine-tuned OLMoE model checkpoints
Training logs WandB Training metrics and loss curves

Usage Examples

Running OLMoE SFT

# Example: OLMoE 1B SFT with specific hyperparameters
python mason.py \
  --cluster ai2/jupiter-cirrascale-2 \
  --budget ai2/oe-adapt \
  --num_nodes 1 --num_gpus 8 \
  -- python open_instruct/finetune.py \
    --model_name_or_path /weka/oe-training-default/ai2-llm/checkpoints/OLMoE/a0125/... \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 2 \
    --learning_rate 5e-6

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment