Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft DeepSpeedExamples ZenFlow Finetune Llama

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Fine Tuning, Large Language Models
Last Updated 2026-02-07 12:00 GMT

Overview

A LLaMA fine-tuning script using DeepSpeed ZenFlow that preprocesses Alpaca instruction data and trains a causal language model with distributed training support.

Description

This module implements an end-to-end fine-tuning pipeline for LLaMA-family models using the DeepSpeed ZenFlow optimization framework. The main function orchestrates the full workflow: setting a reproducible seed, loading a tokenizer and model from HuggingFace with bfloat16 precision, tokenizing the tatsu-lab/alpaca instruction-following dataset, initializing DeepSpeed, and running the training loop.

The preprocess_alpaca function formats each Alpaca example into a structured prompt template with ### Instruction:, optional ### Input:, and ### Response: sections, then tokenizes with truncation and padding to a configurable max_length (default 512). Labels are set to a copy of the input IDs for causal language model training. The set_seed function ensures reproducibility across random, numpy, and torch (both CPU and CUDA) random number generators.

The training loop uses deepspeed.initialize to wrap the model with a DeepSpeed engine, which automatically manages the optimizer, learning rate scheduler, gradient accumulation, and distributed communication. Each step logs loss and wall-clock time on rank 0, and the final checkpoint is saved using model_engine.save_checkpoint alongside the tokenizer.

Usage

Use this script to fine-tune LLaMA or similar causal language models on instruction-following data with DeepSpeed ZenFlow. Launch via the DeepSpeed distributed launcher with a JSON configuration file specifying ZeRO stage, batch size, learning rate, and other DeepSpeed settings.

Code Reference

Source Location

Signature

def set_seed(seed) -> None:
def preprocess_alpaca(example, tokenizer, max_length=512) -> dict:
def main(args) -> None:

Import

from finetune_llama import main, preprocess_alpaca, set_seed

I/O Contract

Inputs

Name Type Required Description
args.model_name str Yes HuggingFace model identifier for the LLaMA model to fine-tune
args.lr float Yes Learning rate for the optimizer
args.batch_size int Yes Training batch size per device
args.weight_decay float No Weight decay coefficient (default: 0.01)
args.warmup float No Warmup proportion (default: 0.01)
args.num_train_epochs int No Number of training epochs (default: 3)
args.output_dir str Yes Directory for saving checkpoints and tokenizer
args.seed int No Random seed for reproducibility (default: 42)
args.local_rank int No Local rank for distributed training (default: -1)
example dict Yes (for preprocess_alpaca) Alpaca dataset example with 'instruction', 'input', 'output' keys
tokenizer AutoTokenizer Yes (for preprocess_alpaca) Tokenizer instance for encoding text

Outputs

Name Type Description
tokenized dict Dictionary with 'input_ids', 'attention_mask', and 'labels' keys from preprocess_alpaca
checkpoint directory DeepSpeed checkpoint saved to args.output_dir on rank 0
tokenizer files directory Saved tokenizer files in args.output_dir on rank 0

Usage Examples

# Command-line launch with DeepSpeed
# deepspeed finetune_llama.py \
#     --model_name meta-llama/Llama-2-7b-hf \
#     --lr 2e-5 \
#     --batch_size 4 \
#     --num_train_epochs 3 \
#     --output_dir ./output \
#     --deepspeed ds_config.json

# Programmatic usage of preprocessing
from transformers import AutoTokenizer
from finetune_llama import preprocess_alpaca

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
example = {
    "instruction": "Summarize the following text.",
    "input": "DeepSpeed is a deep learning optimization library.",
    "output": "DeepSpeed optimizes deep learning training."
}
tokenized = preprocess_alpaca(example, tokenizer, max_length=512)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment