Workflow:Allenai Open instruct SFT Finetuning
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Fine_Tuning, Post_Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
End-to-end process for supervised fine-tuning (SFT) of large language models on instruction-following datasets using Accelerate and DeepSpeed.
Description
This workflow covers the standard procedure for instruction-tuning base language models (Llama, OLMo, Qwen) to follow instructions. It uses HuggingFace Accelerate with DeepSpeed ZeRO Stage 2/3 for distributed training across multiple GPUs and nodes. The process includes dataset preparation with automatic mixing and caching, tokenization with chat templates, model loading with optional LoRA/QLoRA adapters, multi-node distributed training, and checkpoint saving to HuggingFace Hub.
The primary entry point is finetune.py, which supports full fine-tuning, LoRA, and QLoRA parameter-efficient variants. For OLMo-family models, an alternative OLMo-core SFT implementation offers greater GPU efficiency.
Usage
Execute this workflow when you have an instruction-following dataset in the messages format (list of role/content dictionaries) and need to adapt a base pretrained model to follow instructions. This is typically the first stage of the Tulu post-training pipeline, producing an SFT checkpoint that feeds into subsequent DPO and RLVR stages.
Execution Steps
Step 1: Environment_Setup
Prepare the training environment by installing dependencies with uv and configuring the compute infrastructure. For AI2 Beaker-based runs, build a Docker image from the repository and register it. For local runs, ensure Accelerate and DeepSpeed are available.
Key considerations:
- The repository uses uv for package management (install via uv sync)
- Docker images must include the current git commit for reproducibility
- For Beaker launches, the build_image_and_launch.sh script handles Docker build, image upload, and job submission in sequence
Step 2: Dataset_Preparation
Configure the dataset mixture and prepare it for training. The dataset_transformation.py utility handles mixing multiple datasets with specified proportions, tokenizing with the appropriate chat template, filtering by sequence length, and caching the processed result to avoid redundant computation across nodes.
Key considerations:
- Datasets must have a messages key with role/content dictionaries
- The dataset_mixer_list argument specifies datasets and their proportions
- SHA-based caching prevents re-tokenization across restarts and multi-node setups
- Chat templates (tulu, simple_chat) control how conversations are formatted into tokens
Step 3: Model_Loading
Load the base pretrained model with the specified configuration. The script supports full-precision models, LoRA adapters, and QLoRA (quantized LoRA) for memory-efficient training. Flash Attention 2 can be enabled for faster computation.
Key considerations:
- Full fine-tuning updates all model weights and requires the most GPU memory
- LoRA injects low-rank adapters, reducing trainable parameters significantly
- QLoRA adds 4-bit quantization on top of LoRA for even lower memory usage
- LigerKernel can be enabled for optimized fused operations
Step 4: Distributed_Training
Launch the training loop using Accelerate with DeepSpeed. The training distributes across GPUs using ZeRO stages for memory partitioning. The process trains for the specified number of epochs with configurable learning rate scheduling, gradient accumulation, and checkpointing.
Key considerations:
- DeepSpeed ZeRO Stage 3 shards model, optimizer, and gradient states across GPUs
- Gradient accumulation allows larger effective batch sizes with limited GPU memory
- The effective batch size equals num_processes times per_device_batch_size times gradient_accumulation_steps
- Metrics logged include training loss, learning rate, tokens per second, and padding ratio
Step 5: Checkpoint_Saving
Save the trained model checkpoint at specified intervals and at the end of training. The checkpoint can be automatically uploaded to HuggingFace Hub and is compatible with downstream DPO and RLVR training stages.
Key considerations:
- Checkpoints can be saved per epoch or at fixed step intervals
- Auto-upload to HuggingFace Hub is supported for sharing and downstream use
- The saved model is directly usable as model_name_or_path in the DPO training workflow