Implementation:Microsoft DeepSpeedExamples ZenFlow Finetune Llama
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Fine Tuning, Large Language Models |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
A LLaMA fine-tuning script using DeepSpeed ZenFlow that preprocesses Alpaca instruction data and trains a causal language model with distributed training support.
Description
This module implements an end-to-end fine-tuning pipeline for LLaMA-family models using the DeepSpeed ZenFlow optimization framework. The main function orchestrates the full workflow: setting a reproducible seed, loading a tokenizer and model from HuggingFace with bfloat16 precision, tokenizing the tatsu-lab/alpaca instruction-following dataset, initializing DeepSpeed, and running the training loop.
The preprocess_alpaca function formats each Alpaca example into a structured prompt template with ### Instruction:, optional ### Input:, and ### Response: sections, then tokenizes with truncation and padding to a configurable max_length (default 512). Labels are set to a copy of the input IDs for causal language model training. The set_seed function ensures reproducibility across random, numpy, and torch (both CPU and CUDA) random number generators.
The training loop uses deepspeed.initialize to wrap the model with a DeepSpeed engine, which automatically manages the optimizer, learning rate scheduler, gradient accumulation, and distributed communication. Each step logs loss and wall-clock time on rank 0, and the final checkpoint is saved using model_engine.save_checkpoint alongside the tokenizer.
Usage
Use this script to fine-tune LLaMA or similar causal language models on instruction-following data with DeepSpeed ZenFlow. Launch via the DeepSpeed distributed launcher with a JSON configuration file specifying ZeRO stage, batch size, learning rate, and other DeepSpeed settings.
Code Reference
Source Location
- Repository: Microsoft_DeepSpeedExamples
- File: training/DeepSpeed-ZenFlow/finetuning/finetune_llama.py
- Lines: 1-112
Signature
def set_seed(seed) -> None:
def preprocess_alpaca(example, tokenizer, max_length=512) -> dict:
def main(args) -> None:
Import
from finetune_llama import main, preprocess_alpaca, set_seed
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args.model_name | str | Yes | HuggingFace model identifier for the LLaMA model to fine-tune |
| args.lr | float | Yes | Learning rate for the optimizer |
| args.batch_size | int | Yes | Training batch size per device |
| args.weight_decay | float | No | Weight decay coefficient (default: 0.01) |
| args.warmup | float | No | Warmup proportion (default: 0.01) |
| args.num_train_epochs | int | No | Number of training epochs (default: 3) |
| args.output_dir | str | Yes | Directory for saving checkpoints and tokenizer |
| args.seed | int | No | Random seed for reproducibility (default: 42) |
| args.local_rank | int | No | Local rank for distributed training (default: -1) |
| example | dict | Yes (for preprocess_alpaca) | Alpaca dataset example with 'instruction', 'input', 'output' keys |
| tokenizer | AutoTokenizer | Yes (for preprocess_alpaca) | Tokenizer instance for encoding text |
Outputs
| Name | Type | Description |
|---|---|---|
| tokenized | dict | Dictionary with 'input_ids', 'attention_mask', and 'labels' keys from preprocess_alpaca |
| checkpoint | directory | DeepSpeed checkpoint saved to args.output_dir on rank 0 |
| tokenizer files | directory | Saved tokenizer files in args.output_dir on rank 0 |
Usage Examples
# Command-line launch with DeepSpeed
# deepspeed finetune_llama.py \
# --model_name meta-llama/Llama-2-7b-hf \
# --lr 2e-5 \
# --batch_size 4 \
# --num_train_epochs 3 \
# --output_dir ./output \
# --deepspeed ds_config.json
# Programmatic usage of preprocessing
from transformers import AutoTokenizer
from finetune_llama import preprocess_alpaca
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
example = {
"instruction": "Summarize the following text.",
"input": "DeepSpeed is a deep learning optimization library.",
"output": "DeepSpeed optimizes deep learning training."
}
tokenized = preprocess_alpaca(example, tokenizer, max_length=512)