Implementation:Allenai Open instruct Finetune Main
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Deep Learning, Natural Language Processing, MLOps |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for running the full supervised fine-tuning training loop provided by the Open Instruct library.
Description
The main() function in finetune.py is the central entry point for SFT training. It orchestrates the entire pipeline:
- Setup: Initializes HuggingFace Accelerate, configures distributed training, sets random seeds, and optionally initializes W&B experiment tracking.
- Data loading: Calls
get_cached_dataset_tulu()to load, mix, tokenize, and cache the training dataset. Shuffles the dataset and sets it to PyTorch tensor format. - Model loading: Loads the pre-trained model via
AutoModelForCausalLM.from_pretrained()with optional QLoRA quantization, Liger Kernel, or standard bfloat16 loading. Resizes token embeddings if needed and optionally wraps the model with LoRA adapters. - Optimizer and scheduler: Creates AdamW optimizer (optionally fused or 8-bit), configures the learning rate schedule (linear, cosine, or constant with warmup), and prepares everything with Accelerate.
- Training loop: Iterates over epochs and batches, computing the cross-entropy loss on labeled tokens, performing gradient accumulation, clipping gradients, and stepping the optimizer. Logs metrics (loss, learning rate, throughput) to W&B.
- Checkpointing: Saves model checkpoints at configurable intervals (every N steps or each epoch). Manages checkpoint rotation to keep only the last N checkpoints.
- Finalization: Saves the final model and tokenizer, optionally pushes to HuggingFace Hub, and launches evaluation jobs on Beaker.
Usage
Run this function via the command line to start SFT training. It is invoked by the script entry point in finetune.py and receives its configuration from FlatArguments and TokenizerConfig parsed from CLI arguments.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/finetune.py - Lines: L353-965
Signature
def main(args: FlatArguments, tc: TokenizerConfig) -> None:
...
Import
from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args | FlatArguments | Yes | Full training configuration including model path, dataset settings, training hyperparameters, checkpointing, and experiment tracking options. |
| tc | TokenizerConfig | Yes | Tokenizer configuration specifying the tokenizer path, chat template, and related settings. |
Outputs
| Name | Type | Description |
|---|---|---|
| (side effects) | None | The function saves the trained model to args.output_dir, optionally pushes to HuggingFace Hub, logs metrics to W&B, and launches evaluation jobs. No return value.
|
Key Training Hyperparameters
| Parameter | Default | Description |
|---|---|---|
| per_device_train_batch_size | 8 | Micro-batch size per GPU. |
| gradient_accumulation_steps | 1 | Number of micro-batches before an optimizer step. |
| learning_rate | 2e-5 | Peak learning rate for AdamW. |
| num_train_epochs | 2 | Total training epochs. |
| warmup_ratio | 0.03 | Fraction of total steps for linear warmup. |
| weight_decay | 0.0 | AdamW weight decay coefficient. |
| lr_scheduler_type | "linear" | Learning rate decay schedule (linear, cosine, constant, etc.). |
| clip_grad_norm | -1 | Maximum gradient norm for clipping (-1 disables). |
| seed | 42 | Random seed for reproducibility. |
| max_seq_length | None | Maximum sequence length after tokenization. |
| packing | False | Whether to use padding-free collation for increased throughput. |
Usage Examples
Basic Usage
# Typically invoked via command line:
# accelerate launch --config_file configs/ds_configs/deepspeed_zero3.yaml \
# open_instruct/finetune.py \
# --model_name_or_path allenai/Llama-3.1-Tulu-3-8B \
# --dataset_mixer_list allenai/tulu-3-sft-mixture 1.0 \
# --max_seq_length 4096 \
# --per_device_train_batch_size 2 \
# --gradient_accumulation_steps 8 \
# --learning_rate 2e-5 \
# --num_train_epochs 2 \
# --output_dir output/sft_model
# Programmatic usage:
from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig
args = FlatArguments(
model_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
dataset_mixer_list=["allenai/tulu-3-sft-personas-algebra", "1.0"],
max_seq_length=4096,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
learning_rate=2e-5,
num_train_epochs=2,
output_dir="output/sft_model",
)
tc = TokenizerConfig(
tokenizer_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
chat_template_name="tulu",
)
main(args, tc)
Dependencies
- accelerate -- distributed training orchestration (multi-GPU, multi-node, DeepSpeed)
- deepspeed -- ZeRO memory optimization for large model training
- transformers -- model and tokenizer loading, configuration
- torch -- core tensor operations and autograd
- wandb -- experiment tracking and logging (optional, via
with_tracking) - datasets -- HuggingFace Datasets for data loading
- peft -- LoRA and QLoRA adapter support (optional)
- bitsandbytes -- 4-bit/8-bit quantization (optional, for QLoRA)
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment