Environment:PacktPublishing LLM Engineers Handbook AWS SageMaker GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, AWS, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
AWS SageMaker environment with `ml.g5.2xlarge` GPU instances for model fine-tuning, evaluation, and inference endpoint deployment.
Description
This environment provides the cloud GPU compute layer for all model-related workflows. It uses AWS SageMaker managed infrastructure with ml.g5.2xlarge instances (NVIDIA A10G GPU, 24GB VRAM). The environment supports three distinct use cases: (1) HuggingFace Estimator training jobs for fine-tuning, (2) HuggingFace Processor jobs for evaluation, and (3) HuggingFace LLM inference endpoints for model serving. SageMaker containers use PyTorch 2.1 with Transformers 4.36 on Python 3.10.
Usage
Use this environment for LLM Finetuning, Model Evaluation, and RAG Inference (model deployment) workflows. It is required whenever GPU compute is needed beyond local development. The environment requires AWS IAM roles, access keys, and HuggingFace tokens to be configured.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Cloud | AWS Account | With SageMaker permissions |
| Instance | ml.g5.2xlarge | NVIDIA A10G, 24GB VRAM, 8 vCPUs, 32GB RAM |
| IAM | SageMaker Execution Role | Created via `create_execution_role.py` |
| CLI | AWS CLI >= 2.15.42 | For local AWS operations |
| Memory | 5GB minimum per replica | Configured in `ResourceRequirements` |
Dependencies
SageMaker Container Stack
- `pytorch` = 2.1
- `transformers` = 4.36
- `python` = 3.10 (SageMaker container Python version)
- HuggingFace LLM image version = 2.2.0
Local Python Packages (AWS group)
- `sagemaker` >= 2.232.2
- `s3fs` > 2022.3.0
- `aws-profile-manager` >= 0.7.3
- `kubernetes` >= 30.1.0
- `sagemaker-huggingface-inference-toolkit` >= 2.4.0
- `boto3` (transitive dependency)
Credentials
The following environment variables must be set in `.env`:
- `AWS_REGION`: AWS region (default: `eu-central-1`)
- `AWS_ACCESS_KEY`: AWS IAM access key ID
- `AWS_SECRET_KEY`: AWS IAM secret access key
- `AWS_ARN_ROLE`: SageMaker execution role ARN (mandatory for all SageMaker operations)
- `HUGGINGFACE_ACCESS_TOKEN`: HuggingFace token (passed to SageMaker containers)
- `COMET_API_KEY`: Comet ML key (passed to training containers)
Quick Install
# Install AWS dependencies
poetry install --with aws
# Create SageMaker IAM role (one-time setup)
python llm_engineering/infrastructure/aws/roles/create_execution_role.py
# Verify AWS credentials
aws sts get-caller-identity
Code Evidence
AWS ARN role assertion from `llm_engineering/model/finetuning/sagemaker.py:25-26`:
assert settings.HUGGINGFACE_ACCESS_TOKEN, "Hugging Face access token is required."
assert settings.AWS_ARN_ROLE, "AWS ARN role is required."
Training job instance configuration from `llm_engineering/model/finetuning/sagemaker.py:50-58`:
huggingface_estimator = HuggingFace(
entry_point="finetune.py",
source_dir=str(finetuning_dir),
instance_type="ml.g5.2xlarge",
instance_count=1,
role=settings.AWS_ARN_ROLE,
transformers_version="4.36",
pytorch_version="2.1",
py_version="py310",
)
Resource requirements from `llm_engineering/infrastructure/aws/deploy/huggingface/config.py:24-31`:
model_resource_config = ResourceRequirements(
requests={
"copies": settings.COPIES,
"num_accelerators": settings.GPUS,
"num_cpus": settings.CPUS,
"memory": 5 * 1024, # Minimum memory required in Mb (5GB)
},
)
Boto3 client initialization from `llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py:84-88`:
self.sagemaker_client = boto3.client(
"sagemaker",
region_name=settings.AWS_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY,
aws_secret_access_key=settings.AWS_SECRET_KEY,
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `AssertionError: AWS ARN role is required.` | `AWS_ARN_ROLE` not set in `.env` | Run `create_execution_role.py` and add ARN to `.env` |
| `AssertionError: Hugging Face access token is required.` | `HUGGINGFACE_ACCESS_TOKEN` not set | Add HuggingFace token to `.env` |
| `Couldn't load SageMaker imports` | AWS poetry group not installed | Run `poetry install --with aws` |
| `ResourceLimitExceeded` | AWS account quota too low | Request quota increase for ml.g5.2xlarge in AWS console |
Compatibility Notes
- Instance Types: The project defaults to `ml.g5.2xlarge` for all SageMaker operations. Larger models may require `ml.g5.4xlarge` or `ml.g5.12xlarge`.
- Regions: Default region is `eu-central-1`. Not all regions have ml.g5 instances available.
- Cost: Running the full project costs approximately ~$25, primarily from SageMaker compute.
- Container Startup: Inference endpoints have a 15-minute health check timeout (`container_startup_health_check_timeout=900`).
Related Pages
- Implementation:PacktPublishing_LLM_Engineers_Handbook_Run_Finetuning_On_Sagemaker
- Implementation:PacktPublishing_LLM_Engineers_Handbook_HuggingFaceProcessor_Run
- Implementation:PacktPublishing_LLM_Engineers_Handbook_SagemakerHuggingfaceStrategy_Deploy
- Implementation:PacktPublishing_LLM_Engineers_Handbook_InferenceExecutor_Execute