Environment:PacktPublishing LLM Engineers Handbook Unsloth Finetuning Environment
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, LLMs, Finetuning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
GPU-accelerated fine-tuning environment with Unsloth, Flash Attention, LoRA/QLoRA, and TRL for SFT and DPO training of Llama 3.1 8B.
Description
This environment provides the complete fine-tuning stack running inside a SageMaker training container. It uses Unsloth for optimized model loading and patching, Flash Attention 2 for efficient attention computation, PEFT for LoRA adapter injection, and TRL for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). The environment targets PyTorch 2.4.0 (different from the local development PyTorch 2.2.2) and includes bitsandbytes for quantization support.
Usage
Use this environment exclusively for the LLM Finetuning workflow. It runs inside a SageMaker `ml.g5.2xlarge` training job and is installed via a separate `requirements.txt` uploaded with the training script. The environment handles loading the base Llama 3.1 8B model, injecting LoRA adapters, running SFT or DPO training, and merging/saving the final model.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | NVIDIA GPU with CUDA support | Minimum 24GB VRAM (A10G via ml.g5.2xlarge) |
| CUDA | Compatible with PyTorch 2.4.0 | CUDA 11.8 or 12.x |
| Runtime | SageMaker Training Container | PyTorch 2.1 base, Python 3.10 |
| Disk | ~30GB | Model weights + training artifacts |
Dependencies
Python Packages (requirements.txt)
- `accelerate` = 0.33.0
- `torch` = 2.4.0
- `transformers` = 4.43.3
- `datasets` = 2.20.0
- `peft` = 0.12.0
- `trl` = 0.9.6
- `bitsandbytes` = 0.43.3
- `comet-ml` = 3.44.3
- `flash-attn` = 2.3.6
- `unsloth` = 2024.9.post2
Credentials
The following environment variables are injected into the SageMaker training container:
- `HUGGING_FACE_HUB_TOKEN`: HuggingFace token for model downloads
- `COMET_API_KEY`: Comet ML key for experiment tracking
- `COMET_PROJECT_NAME`: Comet ML project name
- `SM_OUTPUT_DATA_DIR`: SageMaker output directory (auto-set)
- `SM_MODEL_DIR`: SageMaker model directory (auto-set)
- `SM_NUM_GPUS`: Number of GPUs (auto-set)
Quick Install
# These packages are installed automatically inside the SageMaker container.
# For local testing (requires CUDA GPU):
pip install accelerate==0.33.0 torch==2.4.0 transformers==4.43.3 \
datasets==2.20.0 peft==0.12.0 trl==0.9.6 bitsandbytes==0.43.3 \
comet-ml==3.44.3 flash-attn==2.3.6 unsloth==2024.9.post2
Code Evidence
Unsloth imports from `llm_engineering/model/finetuning/finetune.py:5,17-18`:
from unsloth import PatchDPOTrainer
from unsloth import FastLanguageModel, is_bfloat16_supported
from unsloth.chat_templates import get_chat_template
SageMaker environment variable access from `llm_engineering/model/finetuning/finetune.py:257-259`:
parser.add_argument("--output_data_dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model_dir", type=str, default=os.environ["SM_MODEL_DIR"])
parser.add_argument("--n_gpus", type=str, default=os.environ["SM_NUM_GPUS"])
CUDA device usage from `llm_engineering/model/finetuning/finetune.py:212`:
inputs = tokenizer([message], return_tensors="pt").to("cuda")
Requirements file from `llm_engineering/model/finetuning/requirements.txt`:
accelerate==0.33.0
torch==2.4.0
transformers==4.43.3
datasets==2.20.0
peft==0.12.0
trl==0.9.6
bitsandbytes==0.43.3
comet-ml==3.44.3
flash-attn==2.3.6
unsloth==2024.9.post2
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDA out of memory` | Model + training data exceeds VRAM | Reduce `per_device_train_batch_size` or use gradient checkpointing |
| `flash-attn installation failed` | Missing CUDA toolkit headers | Ensure CUDA development toolkit is installed on host |
| `FileNotFoundError: requirements.txt` | Requirements file path incorrect | Verify `finetuning_requirements_path` in sagemaker.py |
| `ImportError: unsloth` | Unsloth not installed in container | Check that requirements.txt is correctly uploaded with training job |
Compatibility Notes
- PyTorch Version Mismatch: Local development uses PyTorch 2.2.2, but the SageMaker training container uses PyTorch 2.4.0. This is intentional: Unsloth and flash-attn require the newer version.
- flash-attn: Requires compilation from source, needs CUDA toolkit headers. Pre-built wheels are available for common CUDA versions.
- bfloat16: The training code auto-detects bfloat16 support via `is_bfloat16_supported()` and falls back to fp16 if unavailable.
- Unsloth: Provides optimized forward/backward passes for Llama models, reducing memory usage and training time.
Related Pages
- Implementation:PacktPublishing_LLM_Engineers_Handbook_FastLanguageModel_From_Pretrained
- Implementation:PacktPublishing_LLM_Engineers_Handbook_FastLanguageModel_Get_Peft_Model
- Implementation:PacktPublishing_LLM_Engineers_Handbook_SFTTrainer_Train
- Implementation:PacktPublishing_LLM_Engineers_Handbook_FastLanguageModel_For_Inference
- Implementation:PacktPublishing_LLM_Engineers_Handbook_Save_Pretrained_Merged
- Implementation:PacktPublishing_LLM_Engineers_Handbook_HuggingFace_Load_Dataset