Implementation:PacktPublishing LLM Engineers Handbook SagemakerHuggingfaceStrategy Deploy

Field	Value
Type	API Doc
Workflow	RAG_Inference
Repository	PacktPublishing/LLM-Engineers-Handbook
Source	sagemaker_huggingface.py:L27-71, run.py:L16-35
Implements	Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Model_Deployment

API Signature

SagemakerHuggingfaceStrategy(deployment_service).deploy(
    role_arn,
    llm_image,
    config,
    endpoint_name,
    endpoint_config_name,
    gpu_instance_type,
    resources,
    endpoint_type
)

Import

from llm_engineering.infrastructure.aws.deploy.huggingface.sagemaker_huggingface import (
    SagemakerHuggingfaceStrategy,
    DeploymentService,
)

Key Code

From run.py (the entry point for creating an endpoint):

def create_endpoint(endpoint_type=EndpointType.INFERENCE_COMPONENT_BASED) -> None:
    llm_image = get_huggingface_llm_image_uri("huggingface", version="2.2.0")
    resource_manager = ResourceManager()
    deployment_service = DeploymentService(resource_manager=resource_manager)
    SagemakerHuggingfaceStrategy(deployment_service).deploy(
        role_arn=settings.AWS_ARN_ROLE,
        llm_image=llm_image,
        config=hugging_face_deploy_config,
        endpoint_name=settings.SAGEMAKER_ENDPOINT_INFERENCE,
        endpoint_config_name=settings.SAGEMAKER_ENDPOINT_CONFIG_INFERENCE,
        gpu_instance_type=settings.GPU_INSTANCE_TYPE,
        resources=model_resource_config,
        endpoint_type=endpoint_type,
    )

Parameters

Parameter	Type	Description
role_arn	str	AWS IAM role ARN with SageMaker permissions
llm_image	str	URI of the HuggingFace TGI Docker image
config	dict	HuggingFace deployment configuration (model ID, quantization, etc.)
endpoint_name	str	Name for the SageMaker endpoint
endpoint_config_name	str	Name for the SageMaker endpoint configuration
gpu_instance_type	str	EC2 GPU instance type (e.g., ml.g5.2xlarge)
resources	dict	Resource configuration for the model (CPU, memory, GPU allocations)
endpoint_type	EndpointType	Deployment type (INFERENCE_COMPONENT_BASED or MODEL_BASED)

Inputs and Outputs

Inputs:

role_arn - AWS IAM role ARN for SageMaker execution
llm_image - HuggingFace TGI container image URI
config - Model deployment configuration dictionary
endpoint_name - Target SageMaker endpoint name
gpu_instance_type - GPU instance type for serving

Outputs:

SageMaker endpoint deployed and ready for real-time inference

External Dependencies

sagemaker - AWS SageMaker Python SDK for endpoint management
boto3 - AWS SDK for low-level SageMaker API calls
loguru - Structured logging

Source File

llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py (lines 27-71)
llm_engineering/infrastructure/aws/deploy/huggingface/run.py (lines 16-35)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

API Signature

Import

Key Code

Parameters

Inputs and Outputs

External Dependencies

Source File

See Also

Page Connections