Environment:PacktPublishing LLM Engineers Handbook AWS SageMaker GPU Environment

Knowledge Sources	LLM Engineers Handbook AWS SageMaker Documentation
Domains	Infrastructure, AWS, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

AWS SageMaker environment with `ml.g5.2xlarge` GPU instances for model fine-tuning, evaluation, and inference endpoint deployment.

Description

This environment provides the cloud GPU compute layer for all model-related workflows. It uses AWS SageMaker managed infrastructure with ml.g5.2xlarge instances (NVIDIA A10G GPU, 24GB VRAM). The environment supports three distinct use cases: (1) HuggingFace Estimator training jobs for fine-tuning, (2) HuggingFace Processor jobs for evaluation, and (3) HuggingFace LLM inference endpoints for model serving. SageMaker containers use PyTorch 2.1 with Transformers 4.36 on Python 3.10.

Usage

Use this environment for LLM Finetuning, Model Evaluation, and RAG Inference (model deployment) workflows. It is required whenever GPU compute is needed beyond local development. The environment requires AWS IAM roles, access keys, and HuggingFace tokens to be configured.

System Requirements

Category	Requirement	Notes
Cloud	AWS Account	With SageMaker permissions
Instance	ml.g5.2xlarge	NVIDIA A10G, 24GB VRAM, 8 vCPUs, 32GB RAM
IAM	SageMaker Execution Role	Created via `create_execution_role.py`
CLI	AWS CLI >= 2.15.42	For local AWS operations
Memory	5GB minimum per replica	Configured in `ResourceRequirements`

Dependencies

SageMaker Container Stack

`pytorch` = 2.1
`transformers` = 4.36
`python` = 3.10 (SageMaker container Python version)
HuggingFace LLM image version = 2.2.0

Local Python Packages (AWS group)

`sagemaker` >= 2.232.2
`s3fs` > 2022.3.0
`aws-profile-manager` >= 0.7.3
`kubernetes` >= 30.1.0
`sagemaker-huggingface-inference-toolkit` >= 2.4.0
`boto3` (transitive dependency)

Credentials

The following environment variables must be set in `.env`:

`AWS_REGION`: AWS region (default: `eu-central-1`)
`AWS_ACCESS_KEY`: AWS IAM access key ID
`AWS_SECRET_KEY`: AWS IAM secret access key
`AWS_ARN_ROLE`: SageMaker execution role ARN (mandatory for all SageMaker operations)
`HUGGINGFACE_ACCESS_TOKEN`: HuggingFace token (passed to SageMaker containers)
`COMET_API_KEY`: Comet ML key (passed to training containers)

Quick Install

# Install AWS dependencies
poetry install --with aws

# Create SageMaker IAM role (one-time setup)
python llm_engineering/infrastructure/aws/roles/create_execution_role.py

# Verify AWS credentials
aws sts get-caller-identity

Code Evidence

AWS ARN role assertion from `llm_engineering/model/finetuning/sagemaker.py:25-26`:

assert settings.HUGGINGFACE_ACCESS_TOKEN, "Hugging Face access token is required."
assert settings.AWS_ARN_ROLE, "AWS ARN role is required."

Training job instance configuration from `llm_engineering/model/finetuning/sagemaker.py:50-58`:

huggingface_estimator = HuggingFace(
    entry_point="finetune.py",
    source_dir=str(finetuning_dir),
    instance_type="ml.g5.2xlarge",
    instance_count=1,
    role=settings.AWS_ARN_ROLE,
    transformers_version="4.36",
    pytorch_version="2.1",
    py_version="py310",
)

Resource requirements from `llm_engineering/infrastructure/aws/deploy/huggingface/config.py:24-31`:

model_resource_config = ResourceRequirements(
    requests={
        "copies": settings.COPIES,
        "num_accelerators": settings.GPUS,
        "num_cpus": settings.CPUS,
        "memory": 5 * 1024,  # Minimum memory required in Mb (5GB)
    },
)

Boto3 client initialization from `llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py:84-88`:

self.sagemaker_client = boto3.client(
    "sagemaker",
    region_name=settings.AWS_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY,
    aws_secret_access_key=settings.AWS_SECRET_KEY,
)

Common Errors

Error Message	Cause	Solution
`AssertionError: AWS ARN role is required.`	`AWS_ARN_ROLE` not set in `.env`	Run `create_execution_role.py` and add ARN to `.env`
`AssertionError: Hugging Face access token is required.`	`HUGGINGFACE_ACCESS_TOKEN` not set	Add HuggingFace token to `.env`
`Couldn't load SageMaker imports`	AWS poetry group not installed	Run `poetry install --with aws`
`ResourceLimitExceeded`	AWS account quota too low	Request quota increase for ml.g5.2xlarge in AWS console

Compatibility Notes

Instance Types: The project defaults to `ml.g5.2xlarge` for all SageMaker operations. Larger models may require `ml.g5.4xlarge` or `ml.g5.12xlarge`.
Regions: Default region is `eu-central-1`. Not all regions have ml.g5 instances available.
Cost: Running the full project costs approximately ~$25, primarily from SageMaker compute.
Container Startup: Inference endpoints have a 15-minute health check timeout (`container_startup_health_check_timeout=900`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment