Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Allenai Open instruct Finetune Main

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Deep Learning, Natural Language Processing, MLOps
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for running the full supervised fine-tuning training loop provided by the Open Instruct library.

Description

The main() function in finetune.py is the central entry point for SFT training. It orchestrates the entire pipeline:

  1. Setup: Initializes HuggingFace Accelerate, configures distributed training, sets random seeds, and optionally initializes W&B experiment tracking.
  2. Data loading: Calls get_cached_dataset_tulu() to load, mix, tokenize, and cache the training dataset. Shuffles the dataset and sets it to PyTorch tensor format.
  3. Model loading: Loads the pre-trained model via AutoModelForCausalLM.from_pretrained() with optional QLoRA quantization, Liger Kernel, or standard bfloat16 loading. Resizes token embeddings if needed and optionally wraps the model with LoRA adapters.
  4. Optimizer and scheduler: Creates AdamW optimizer (optionally fused or 8-bit), configures the learning rate schedule (linear, cosine, or constant with warmup), and prepares everything with Accelerate.
  5. Training loop: Iterates over epochs and batches, computing the cross-entropy loss on labeled tokens, performing gradient accumulation, clipping gradients, and stepping the optimizer. Logs metrics (loss, learning rate, throughput) to W&B.
  6. Checkpointing: Saves model checkpoints at configurable intervals (every N steps or each epoch). Manages checkpoint rotation to keep only the last N checkpoints.
  7. Finalization: Saves the final model and tokenizer, optionally pushes to HuggingFace Hub, and launches evaluation jobs on Beaker.

Usage

Run this function via the command line to start SFT training. It is invoked by the script entry point in finetune.py and receives its configuration from FlatArguments and TokenizerConfig parsed from CLI arguments.

Code Reference

Source Location

  • Repository: Open Instruct
  • File: open_instruct/finetune.py
  • Lines: L353-965

Signature

def main(args: FlatArguments, tc: TokenizerConfig) -> None:
    ...

Import

from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig

I/O Contract

Inputs

Name Type Required Description
args FlatArguments Yes Full training configuration including model path, dataset settings, training hyperparameters, checkpointing, and experiment tracking options.
tc TokenizerConfig Yes Tokenizer configuration specifying the tokenizer path, chat template, and related settings.

Outputs

Name Type Description
(side effects) None The function saves the trained model to args.output_dir, optionally pushes to HuggingFace Hub, logs metrics to W&B, and launches evaluation jobs. No return value.

Key Training Hyperparameters

Parameter Default Description
per_device_train_batch_size 8 Micro-batch size per GPU.
gradient_accumulation_steps 1 Number of micro-batches before an optimizer step.
learning_rate 2e-5 Peak learning rate for AdamW.
num_train_epochs 2 Total training epochs.
warmup_ratio 0.03 Fraction of total steps for linear warmup.
weight_decay 0.0 AdamW weight decay coefficient.
lr_scheduler_type "linear" Learning rate decay schedule (linear, cosine, constant, etc.).
clip_grad_norm -1 Maximum gradient norm for clipping (-1 disables).
seed 42 Random seed for reproducibility.
max_seq_length None Maximum sequence length after tokenization.
packing False Whether to use padding-free collation for increased throughput.

Usage Examples

Basic Usage

# Typically invoked via command line:
# accelerate launch --config_file configs/ds_configs/deepspeed_zero3.yaml \
#   open_instruct/finetune.py \
#   --model_name_or_path allenai/Llama-3.1-Tulu-3-8B \
#   --dataset_mixer_list allenai/tulu-3-sft-mixture 1.0 \
#   --max_seq_length 4096 \
#   --per_device_train_batch_size 2 \
#   --gradient_accumulation_steps 8 \
#   --learning_rate 2e-5 \
#   --num_train_epochs 2 \
#   --output_dir output/sft_model

# Programmatic usage:
from open_instruct.finetune import main, FlatArguments
from open_instruct.dataset_transformation import TokenizerConfig

args = FlatArguments(
    model_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
    dataset_mixer_list=["allenai/tulu-3-sft-personas-algebra", "1.0"],
    max_seq_length=4096,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=2e-5,
    num_train_epochs=2,
    output_dir="output/sft_model",
)

tc = TokenizerConfig(
    tokenizer_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
    chat_template_name="tulu",
)

main(args, tc)

Dependencies

  • accelerate -- distributed training orchestration (multi-GPU, multi-node, DeepSpeed)
  • deepspeed -- ZeRO memory optimization for large model training
  • transformers -- model and tokenizer loading, configuration
  • torch -- core tensor operations and autograd
  • wandb -- experiment tracking and logging (optional, via with_tracking)
  • datasets -- HuggingFace Datasets for data loading
  • peft -- LoRA and QLoRA adapter support (optional)
  • bitsandbytes -- 4-bit/8-bit quantization (optional, for QLoRA)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment