Workflow:Allenai Open instruct SFT Finetuning

Knowledge Sources	Open Instruct Tulu 3 Open Instruct Docs
Domains	LLMs, Fine_Tuning, Post_Training
Last Updated	2026-02-07 00:00 GMT

Overview

End-to-end process for supervised fine-tuning (SFT) of large language models on instruction-following datasets using Accelerate and DeepSpeed.

Description

This workflow covers the standard procedure for instruction-tuning base language models (Llama, OLMo, Qwen) to follow instructions. It uses HuggingFace Accelerate with DeepSpeed ZeRO Stage 2/3 for distributed training across multiple GPUs and nodes. The process includes dataset preparation with automatic mixing and caching, tokenization with chat templates, model loading with optional LoRA/QLoRA adapters, multi-node distributed training, and checkpoint saving to HuggingFace Hub.

The primary entry point is finetune.py, which supports full fine-tuning, LoRA, and QLoRA parameter-efficient variants. For OLMo-family models, an alternative OLMo-core SFT implementation offers greater GPU efficiency.

Usage

Execute this workflow when you have an instruction-following dataset in the messages format (list of role/content dictionaries) and need to adapt a base pretrained model to follow instructions. This is typically the first stage of the Tulu post-training pipeline, producing an SFT checkpoint that feeds into subsequent DPO and RLVR stages.

Execution Steps

Step 1: Environment_Setup

Prepare the training environment by installing dependencies with uv and configuring the compute infrastructure. For AI2 Beaker-based runs, build a Docker image from the repository and register it. For local runs, ensure Accelerate and DeepSpeed are available.

Key considerations:

The repository uses uv for package management (install via uv sync)
Docker images must include the current git commit for reproducibility
For Beaker launches, the build_image_and_launch.sh script handles Docker build, image upload, and job submission in sequence

Step 2: Dataset_Preparation

Configure the dataset mixture and prepare it for training. The dataset_transformation.py utility handles mixing multiple datasets with specified proportions, tokenizing with the appropriate chat template, filtering by sequence length, and caching the processed result to avoid redundant computation across nodes.

Key considerations:

Datasets must have a messages key with role/content dictionaries
The dataset_mixer_list argument specifies datasets and their proportions
SHA-based caching prevents re-tokenization across restarts and multi-node setups
Chat templates (tulu, simple_chat) control how conversations are formatted into tokens

Step 3: Model_Loading

Load the base pretrained model with the specified configuration. The script supports full-precision models, LoRA adapters, and QLoRA (quantized LoRA) for memory-efficient training. Flash Attention 2 can be enabled for faster computation.

Key considerations:

Full fine-tuning updates all model weights and requires the most GPU memory
LoRA injects low-rank adapters, reducing trainable parameters significantly
QLoRA adds 4-bit quantization on top of LoRA for even lower memory usage
LigerKernel can be enabled for optimized fused operations

Step 4: Distributed_Training

Launch the training loop using Accelerate with DeepSpeed. The training distributes across GPUs using ZeRO stages for memory partitioning. The process trains for the specified number of epochs with configurable learning rate scheduling, gradient accumulation, and checkpointing.

Key considerations:

DeepSpeed ZeRO Stage 3 shards model, optimizer, and gradient states across GPUs
Gradient accumulation allows larger effective batch sizes with limited GPU memory
The effective batch size equals num_processes times per_device_batch_size times gradient_accumulation_steps
Metrics logged include training loss, learning rate, tokens per second, and padding ratio

Step 5: Checkpoint_Saving

Save the trained model checkpoint at specified intervals and at the end of training. The checkpoint can be automatically uploaded to HuggingFace Hub and is compatible with downstream DPO and RLVR training stages.

Key considerations:

Checkpoints can be saved per epoch or at fixed step intervals
Auto-upload to HuggingFace Hub is supported for sharing and downstream use
The saved model is directly usable as model_name_or_path in the DPO training workflow

Execution Diagram

GitHub URL

Workflow Repository