Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:PacktPublishing LLM Engineers Handbook AWS SageMaker GPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, AWS, Deep_Learning
Last Updated 2026-02-08 08:00 GMT

Overview

AWS SageMaker environment with `ml.g5.2xlarge` GPU instances for model fine-tuning, evaluation, and inference endpoint deployment.

Description

This environment provides the cloud GPU compute layer for all model-related workflows. It uses AWS SageMaker managed infrastructure with ml.g5.2xlarge instances (NVIDIA A10G GPU, 24GB VRAM). The environment supports three distinct use cases: (1) HuggingFace Estimator training jobs for fine-tuning, (2) HuggingFace Processor jobs for evaluation, and (3) HuggingFace LLM inference endpoints for model serving. SageMaker containers use PyTorch 2.1 with Transformers 4.36 on Python 3.10.

Usage

Use this environment for LLM Finetuning, Model Evaluation, and RAG Inference (model deployment) workflows. It is required whenever GPU compute is needed beyond local development. The environment requires AWS IAM roles, access keys, and HuggingFace tokens to be configured.

System Requirements

Category Requirement Notes
Cloud AWS Account With SageMaker permissions
Instance ml.g5.2xlarge NVIDIA A10G, 24GB VRAM, 8 vCPUs, 32GB RAM
IAM SageMaker Execution Role Created via `create_execution_role.py`
CLI AWS CLI >= 2.15.42 For local AWS operations
Memory 5GB minimum per replica Configured in `ResourceRequirements`

Dependencies

SageMaker Container Stack

  • `pytorch` = 2.1
  • `transformers` = 4.36
  • `python` = 3.10 (SageMaker container Python version)
  • HuggingFace LLM image version = 2.2.0

Local Python Packages (AWS group)

  • `sagemaker` >= 2.232.2
  • `s3fs` > 2022.3.0
  • `aws-profile-manager` >= 0.7.3
  • `kubernetes` >= 30.1.0
  • `sagemaker-huggingface-inference-toolkit` >= 2.4.0
  • `boto3` (transitive dependency)

Credentials

The following environment variables must be set in `.env`:

  • `AWS_REGION`: AWS region (default: `eu-central-1`)
  • `AWS_ACCESS_KEY`: AWS IAM access key ID
  • `AWS_SECRET_KEY`: AWS IAM secret access key
  • `AWS_ARN_ROLE`: SageMaker execution role ARN (mandatory for all SageMaker operations)
  • `HUGGINGFACE_ACCESS_TOKEN`: HuggingFace token (passed to SageMaker containers)
  • `COMET_API_KEY`: Comet ML key (passed to training containers)

Quick Install

# Install AWS dependencies
poetry install --with aws

# Create SageMaker IAM role (one-time setup)
python llm_engineering/infrastructure/aws/roles/create_execution_role.py

# Verify AWS credentials
aws sts get-caller-identity

Code Evidence

AWS ARN role assertion from `llm_engineering/model/finetuning/sagemaker.py:25-26`:

assert settings.HUGGINGFACE_ACCESS_TOKEN, "Hugging Face access token is required."
assert settings.AWS_ARN_ROLE, "AWS ARN role is required."

Training job instance configuration from `llm_engineering/model/finetuning/sagemaker.py:50-58`:

huggingface_estimator = HuggingFace(
    entry_point="finetune.py",
    source_dir=str(finetuning_dir),
    instance_type="ml.g5.2xlarge",
    instance_count=1,
    role=settings.AWS_ARN_ROLE,
    transformers_version="4.36",
    pytorch_version="2.1",
    py_version="py310",
)

Resource requirements from `llm_engineering/infrastructure/aws/deploy/huggingface/config.py:24-31`:

model_resource_config = ResourceRequirements(
    requests={
        "copies": settings.COPIES,
        "num_accelerators": settings.GPUS,
        "num_cpus": settings.CPUS,
        "memory": 5 * 1024,  # Minimum memory required in Mb (5GB)
    },
)

Boto3 client initialization from `llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py:84-88`:

self.sagemaker_client = boto3.client(
    "sagemaker",
    region_name=settings.AWS_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY,
    aws_secret_access_key=settings.AWS_SECRET_KEY,
)

Common Errors

Error Message Cause Solution
`AssertionError: AWS ARN role is required.` `AWS_ARN_ROLE` not set in `.env` Run `create_execution_role.py` and add ARN to `.env`
`AssertionError: Hugging Face access token is required.` `HUGGINGFACE_ACCESS_TOKEN` not set Add HuggingFace token to `.env`
`Couldn't load SageMaker imports` AWS poetry group not installed Run `poetry install --with aws`
`ResourceLimitExceeded` AWS account quota too low Request quota increase for ml.g5.2xlarge in AWS console

Compatibility Notes

  • Instance Types: The project defaults to `ml.g5.2xlarge` for all SageMaker operations. Larger models may require `ml.g5.4xlarge` or `ml.g5.12xlarge`.
  • Regions: Default region is `eu-central-1`. Not all regions have ml.g5 instances available.
  • Cost: Running the full project costs approximately ~$25, primarily from SageMaker compute.
  • Container Startup: Inference endpoints have a 15-minute health check timeout (`container_startup_health_check_timeout=900`).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment