Implementation:Allenai Open instruct FlatArguments
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Software Engineering, MLOps |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for specifying all SFT training hyperparameters and settings in a single flat dataclass provided by the Open Instruct library.
Description
The FlatArguments dataclass is the complete configuration object for the SFT training pipeline in finetune.py. It consolidates model settings, dataset configuration, optimization hyperparameters, checkpointing options, experiment tracking, and AI2-specific infrastructure settings into a single class. The "flat" design (as opposed to nested configs) makes it directly compatible with HuggingFace's CLI argument parser.
The class includes a __post_init__ validator that enforces mutual exclusivity of dataset sources, checks that evaluation job launching requires Hub access, validates the final learning rate ratio, and parses string representations of dict arguments from the CLI.
Usage
FlatArguments is parsed from command-line arguments at the script entry point and passed to the main() function. It can also be constructed programmatically for testing or notebook usage.
Code Reference
Source Location
- Repository: Open Instruct
- File:
open_instruct/finetune.py - Lines: L75-351
Signature
@dataclass
class FlatArguments:
"""Full arguments class for all fine-tuning jobs."""
# Model settings
exp_name: str = "finetune"
model_name_or_path: str | None = None
config_name: str | None = None
use_flash_attn: bool = True
model_revision: str | None = None
additional_model_arguments: dict | str | None = {}
low_cpu_mem_usage: bool = False
# Dataset settings
dataset_name: str | None = None
dataset_mixer: dict | None = None
dataset_mixer_list: list[str] = ["allenai/tulu-3-sft-personas-algebra", "1.0"]
dataset_mixer_list_splits: list[str] = ["train"]
dataset_transform_fn: list[str] = ["sft_tulu_tokenize_and_truncate_v1", "sft_tulu_filter_v1"]
dataset_target_columns: list[str] = TOKENIZED_SFT_DATASET_KEYS
dataset_cache_mode: Literal["hf", "local"] = "local"
dataset_local_cache_dir: str = "local_dataset_cache"
dataset_config_hash: str | None = None
dataset_skip_cache: bool = False
# Training hyperparameters
max_seq_length: int | None = None
max_train_samples: int | None = None
per_device_train_batch_size: int = 8
gradient_accumulation_steps: int = 1
learning_rate: float = 2e-5
num_train_epochs: int = 2
max_train_steps: int | None = None
warmup_ratio: float = 0.03
final_lr_ratio: float | None = None
weight_decay: float = 0.0
lr_scheduler_type: str = "linear"
clip_grad_norm: float = -1
seed: int = 42
# LoRA settings
use_lora: bool = False
use_qlora: bool = False
lora_rank: int = 64
lora_alpha: float = 16
lora_dropout: float = 0.1
# Checkpointing
output_dir: str = "output/"
checkpointing_steps: str | None = None
keep_last_n_checkpoints: int = 3
resume_from_checkpoint: str | None = None
gradient_checkpointing: bool = False
# Experiment tracking
with_tracking: bool = False
wandb_project_name: str = "open_instruct_internal"
wandb_entity: str | None = None
report_to: str | list[str] = "all"
# Hub settings
push_to_hub: bool = True
hf_entity: str | None = None
hf_repo_id: str | None = None
hf_repo_revision: str | None = None
save_to_hub: str | None = None
# Advanced
packing: bool = False
use_liger_kernel: bool = False
sync_each_batch: bool = False
fused_optimizer: bool = True
use_8bit_optimizer: bool = False
load_balancing_loss: bool = False
load_balancing_weight: float = 0.5
...
Import
from open_instruct.finetune import FlatArguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str or None | Yes | HuggingFace model ID or local path to the pre-trained model. |
| dataset_mixer_list | list[str] | Yes | Alternating list of dataset names and mixing ratios. |
| max_seq_length | int or None | Recommended | Maximum token sequence length. Sequences longer than this are truncated. |
| per_device_train_batch_size | int | No | Micro-batch size per GPU. Defaults to 8. |
| gradient_accumulation_steps | int | No | Steps to accumulate before optimizer update. Defaults to 1. |
| learning_rate | float | No | Peak learning rate. Defaults to 2e-5. |
| num_train_epochs | int | No | Training epochs. Defaults to 2. |
| output_dir | str | No | Directory for checkpoints and final model. Defaults to "output/".
|
| use_lora | bool | No | Enable LoRA training. Defaults to False. |
| use_flash_attn | bool | No | Enable Flash Attention 2. Defaults to True. |
| gradient_checkpointing | bool | No | Enable gradient checkpointing. Defaults to False. |
| with_tracking | bool | No | Enable W&B experiment tracking. Defaults to False. |
| packing | bool | No | Enable padding-free collation. Defaults to False. |
Outputs
| Name | Type | Description |
|---|---|---|
| (dataclass instance) | FlatArguments | A validated configuration object passed to main() for training.
|
Usage Examples
Basic Usage
from open_instruct.finetune import FlatArguments
args = FlatArguments(
model_name_or_path="allenai/Llama-3.1-Tulu-3-8B",
dataset_mixer_list=["allenai/tulu-3-sft-personas-algebra", "1.0"],
max_seq_length=4096,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
num_train_epochs=2,
output_dir="output/my_sft_run",
with_tracking=True,
wandb_project_name="my_project",
)
CLI Usage
accelerate launch open_instruct/finetune.py \
--model_name_or_path allenai/Llama-3.1-Tulu-3-8B \
--tokenizer_name_or_path allenai/Llama-3.1-Tulu-3-8B \
--dataset_mixer_list allenai/tulu-3-sft-personas-algebra 1.0 \
--max_seq_length 4096 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--learning_rate 2e-5 \
--num_train_epochs 2 \
--output_dir output/my_sft_run \
--with_tracking \
--wandb_project_name my_project