Implementation:PacktPublishing LLM Engineers Handbook Run Finetuning On Sagemaker

Field	Value
Implementation Name	Run Finetuning On Sagemaker
Type	API Doc
Source File	llm_engineering/model/finetuning/sagemaker.py:L17-69
Workflow	LLM_Finetuning
Repo	PacktPublishing/LLM-Engineers-Handbook
Implements	Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Training_Orchestration

Function Signature

def run_finetuning_on_sagemaker(
    finetuning_type: str,
    num_train_epochs: int,
    per_device_train_batch_size: int,
    learning_rate: float,
    dataset_huggingface_workspace: str,
    is_dummy: bool,
) -> None

Import

from llm_engineering.model.finetuning.sagemaker import run_finetuning_on_sagemaker

Description

This function orchestrates the submission of an LLM fine-tuning job to AWS SageMaker. It constructs a HuggingFace Estimator with all necessary configuration -- instance type, hyperparameters, dependencies, and entry point -- then calls .fit() to launch the managed training job.

The function does not perform any training itself; it delegates execution to SageMaker, which provisions a GPU instance, sets up the container, and runs the finetune.py entry point script.

Parameters

Parameter	Type	Default	Description
`finetuning_type`	`str`	`"sft"`	Type of fine-tuning to perform. Either `"sft"` (Supervised Fine-Tuning) or `"dpo"` (Direct Preference Optimization).
`num_train_epochs`	`int`	`3`	Number of training epochs.
`per_device_train_batch_size`	`int`	`2`	Batch size per GPU device.
`learning_rate`	`float`	`3e-4`	Learning rate for the optimizer.
`dataset_huggingface_workspace`	`str`	—	HuggingFace workspace containing the training dataset.
`is_dummy`	`bool`	`False`	If `True`, runs a minimal training job for testing purposes.

Returns

None -- The function submits the job and blocks until completion. Model artifacts are saved to S3 by SageMaker.

Key Implementation Details

SageMaker Estimator Configuration

from sagemaker.huggingface import HuggingFace

huggingface_estimator = HuggingFace(
    entry_point="finetune.py",
    source_dir=str(Path(__file__).resolve().parent),
    instance_type="ml.g5.2xlarge",
    instance_count=1,
    transformers_version="4.36",
    pytorch_version="2.1",
    py_version="py310",
    hyperparameters={
        "finetuning_type": finetuning_type,
        "num_train_epochs": num_train_epochs,
        "per_device_train_batch_size": per_device_train_batch_size,
        "learning_rate": learning_rate,
        "dataset_huggingface_workspace": dataset_huggingface_workspace,
        "is_dummy": is_dummy,
    },
    role=settings.AWS_ARN_ROLE,
    environment={
        "HUGGING_FACE_HUB_TOKEN": settings.HUGGINGFACE_ACCESS_TOKEN,
        "COMET_API_KEY": settings.COMET_API_KEY,
        "COMET_PROJECT": settings.COMET_PROJECT,
        "COMET_WORKSPACE": settings.COMET_WORKSPACE,
    },
)
huggingface_estimator.fit()

Key Aspects

Instance type: ml.g5.2xlarge provides an NVIDIA A10G GPU with 24GB VRAM.
Entry point: finetune.py is the script that runs inside the SageMaker container.
Source directory: The entire finetuning/ directory is packaged and uploaded to the container.
Environment variables: HuggingFace tokens, Comet ML keys are passed securely via environment variables.
Hyperparameters: Passed as a dictionary and injected as command-line arguments to the entry point.

External Dependencies

Package	Purpose
`sagemaker`	AWS SageMaker Python SDK for job submission
`huggingface_hub`	Model/dataset access tokens
`loguru`	Structured logging

Usage Example

from llm_engineering.model.finetuning.sagemaker import run_finetuning_on_sagemaker

# Launch an SFT fine-tuning job on SageMaker
run_finetuning_on_sagemaker(
    finetuning_type="sft",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    learning_rate=3e-4,
    dataset_huggingface_workspace="my-hf-workspace",
    is_dummy=False,
)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment