Implementation:Allenai Open instruct Finetune Main

Knowledge Sources	Open Instruct
Domains	Machine Learning, Deep Learning, Natural Language Processing, MLOps
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for running the full supervised fine-tuning training loop provided by the Open Instruct library.

Description

The main() function in finetune.py is the central entry point for SFT training. It orchestrates the entire pipeline:

Setup: Initializes HuggingFace Accelerate, configures distributed training, sets random seeds, and optionally initializes W&B experiment tracking.
Data loading: Calls get_cached_dataset_tulu() to load, mix, tokenize, and cache the training dataset. Shuffles the dataset and sets it to PyTorch tensor format.
Model loading: Loads the pre-trained model via AutoModelForCausalLM.from_pretrained() with optional QLoRA quantization, Liger Kernel, or standard bfloat16 loading. Resizes token embeddings if needed and optionally wraps the model with LoRA adapters.
Optimizer and scheduler: Creates AdamW optimizer (optionally fused or 8-bit), configures the learning rate schedule (linear, cosine, or constant with warmup), and prepares everything with Accelerate.
Training loop: Iterates over epochs and batches, computing the cross-entropy loss on labeled tokens, performing gradient accumulation, clipping gradients, and stepping the optimizer. Logs metrics (loss, learning rate, throughput) to W&B.
Checkpointing: Saves model checkpoints at configurable intervals (every N steps or each epoch). Manages checkpoint rotation to keep only the last N checkpoints.
Finalization: Saves the final model and tokenizer, optionally pushes to HuggingFace Hub, and launches evaluation jobs on Beaker.

Usage

Run this function via the command line to start SFT training. It is invoked by the script entry point in finetune.py and receives its configuration from FlatArguments and TokenizerConfig parsed from CLI arguments.

Code Reference

Source Location

Repository: Open Instruct
File: open_instruct/finetune.py
Lines: L353-965

Signature

def main(args: FlatArguments, tc: TokenizerConfig) -> None:
    ...

Import

from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig

I/O Contract

Inputs

Name	Type	Required	Description
args	FlatArguments	Yes	Full training configuration including model path, dataset settings, training hyperparameters, checkpointing, and experiment tracking options.
tc	TokenizerConfig	Yes	Tokenizer configuration specifying the tokenizer path, chat template, and related settings.

Outputs

Name	Type	Description
(side effects)	None	The function saves the trained model to `args.output_dir`, optionally pushes to HuggingFace Hub, logs metrics to W&B, and launches evaluation jobs. No return value.

Key Training Hyperparameters

Parameter	Default	Description
per_device_train_batch_size	8	Micro-batch size per GPU.
gradient_accumulation_steps	1	Number of micro-batches before an optimizer step.
learning_rate	2e-5	Peak learning rate for AdamW.
num_train_epochs	2	Total training epochs.
warmup_ratio	0.03	Fraction of total steps for linear warmup.
weight_decay	0.0	AdamW weight decay coefficient.
lr_scheduler_type	"linear"	Learning rate decay schedule (linear, cosine, constant, etc.).
clip_grad_norm	-1	Maximum gradient norm for clipping (-1 disables).
seed	42	Random seed for reproducibility.
max_seq_length	None	Maximum sequence length after tokenization.
packing	False	Whether to use padding-free collation for increased throughput.

Usage Examples

Basic Usage

# Typically invoked via command line:
# accelerate launch --config_file configs/ds_configs/deepspeed_zero3.yaml \
#   open_instruct/finetune.py \
#   --model_name_or_path allenai/Llama-3.1-Tulu-3-8B \
#   --dataset_mixer_list allenai/tulu-3-sft-mixture 1.0 \
#   --max_seq_length 4096 \
#   --per_device_train_batch_size 2 \
#   --gradient_accumulation_steps 8 \
#   --learning_rate 2e-5 \
#   --num_train_epochs 2 \
#   --output_dir output/sft_model

# Programmatic usage:
from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig

args = FlatArguments(
    model_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
    dataset_mixer_list=["allenai/tulu-3-sft-personas-algebra", "1.0"],
    max_seq_length=4096,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    num_train_epochs=2,
    output_dir="output/sft_model",
)

tc = TokenizerConfig(
    tokenizer_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
    chat_template_name="tulu",
)

main(args, tc)

Dependencies

accelerate -- distributed training orchestration (multi-GPU, multi-node, DeepSpeed)
deepspeed -- ZeRO memory optimization for large model training
transformers -- model and tokenizer loading, configuration
torch -- core tensor operations and autograd
wandb -- experiment tracking and logging (optional, via with_tracking)
datasets -- HuggingFace Datasets for data loading
peft -- LoRA and QLoRA adapter support (optional)
bitsandbytes -- 4-bit/8-bit quantization (optional, for QLoRA)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment