Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:PacktPublishing LLM Engineers Handbook Unsloth Finetuning Environment

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, LLMs, Finetuning
Last Updated 2026-02-08 08:00 GMT

Overview

GPU-accelerated fine-tuning environment with Unsloth, Flash Attention, LoRA/QLoRA, and TRL for SFT and DPO training of Llama 3.1 8B.

Description

This environment provides the complete fine-tuning stack running inside a SageMaker training container. It uses Unsloth for optimized model loading and patching, Flash Attention 2 for efficient attention computation, PEFT for LoRA adapter injection, and TRL for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). The environment targets PyTorch 2.4.0 (different from the local development PyTorch 2.2.2) and includes bitsandbytes for quantization support.

Usage

Use this environment exclusively for the LLM Finetuning workflow. It runs inside a SageMaker `ml.g5.2xlarge` training job and is installed via a separate `requirements.txt` uploaded with the training script. The environment handles loading the base Llama 3.1 8B model, injecting LoRA adapters, running SFT or DPO training, and merging/saving the final model.

System Requirements

Category Requirement Notes
Hardware NVIDIA GPU with CUDA support Minimum 24GB VRAM (A10G via ml.g5.2xlarge)
CUDA Compatible with PyTorch 2.4.0 CUDA 11.8 or 12.x
Runtime SageMaker Training Container PyTorch 2.1 base, Python 3.10
Disk ~30GB Model weights + training artifacts

Dependencies

Python Packages (requirements.txt)

  • `accelerate` = 0.33.0
  • `torch` = 2.4.0
  • `transformers` = 4.43.3
  • `datasets` = 2.20.0
  • `peft` = 0.12.0
  • `trl` = 0.9.6
  • `bitsandbytes` = 0.43.3
  • `comet-ml` = 3.44.3
  • `flash-attn` = 2.3.6
  • `unsloth` = 2024.9.post2

Credentials

The following environment variables are injected into the SageMaker training container:

  • `HUGGING_FACE_HUB_TOKEN`: HuggingFace token for model downloads
  • `COMET_API_KEY`: Comet ML key for experiment tracking
  • `COMET_PROJECT_NAME`: Comet ML project name
  • `SM_OUTPUT_DATA_DIR`: SageMaker output directory (auto-set)
  • `SM_MODEL_DIR`: SageMaker model directory (auto-set)
  • `SM_NUM_GPUS`: Number of GPUs (auto-set)

Quick Install

# These packages are installed automatically inside the SageMaker container.
# For local testing (requires CUDA GPU):
pip install accelerate==0.33.0 torch==2.4.0 transformers==4.43.3 \
    datasets==2.20.0 peft==0.12.0 trl==0.9.6 bitsandbytes==0.43.3 \
    comet-ml==3.44.3 flash-attn==2.3.6 unsloth==2024.9.post2

Code Evidence

Unsloth imports from `llm_engineering/model/finetuning/finetune.py:5,17-18`:

from unsloth import PatchDPOTrainer
from unsloth import FastLanguageModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template

SageMaker environment variable access from `llm_engineering/model/finetuning/finetune.py:257-259`:

parser.add_argument("--output_data_dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model_dir", type=str, default=os.environ["SM_MODEL_DIR"])
parser.add_argument("--n_gpus", type=str, default=os.environ["SM_NUM_GPUS"])

CUDA device usage from `llm_engineering/model/finetuning/finetune.py:212`:

inputs = tokenizer([message], return_tensors="pt").to("cuda")

Requirements file from `llm_engineering/model/finetuning/requirements.txt`:

accelerate==0.33.0
torch==2.4.0
transformers==4.43.3
datasets==2.20.0
peft==0.12.0
trl==0.9.6
bitsandbytes==0.43.3
comet-ml==3.44.3
flash-attn==2.3.6
unsloth==2024.9.post2

Common Errors

Error Message Cause Solution
`CUDA out of memory` Model + training data exceeds VRAM Reduce `per_device_train_batch_size` or use gradient checkpointing
`flash-attn installation failed` Missing CUDA toolkit headers Ensure CUDA development toolkit is installed on host
`FileNotFoundError: requirements.txt` Requirements file path incorrect Verify `finetuning_requirements_path` in sagemaker.py
`ImportError: unsloth` Unsloth not installed in container Check that requirements.txt is correctly uploaded with training job

Compatibility Notes

  • PyTorch Version Mismatch: Local development uses PyTorch 2.2.2, but the SageMaker training container uses PyTorch 2.4.0. This is intentional: Unsloth and flash-attn require the newer version.
  • flash-attn: Requires compilation from source, needs CUDA toolkit headers. Pre-built wheels are available for common CUDA versions.
  • bfloat16: The training code auto-detects bfloat16 support via `is_bfloat16_supported()` and falls back to fp16 if unavailable.
  • Unsloth: Provides optimized forward/backward passes for Llama models, reducing memory usage and training time.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment