Implementation:PacktPublishing LLM Engineers Handbook SagemakerHuggingfaceStrategy Deploy
Appearance
| Field | Value |
|---|---|
| Type | API Doc |
| Workflow | RAG_Inference |
| Repository | PacktPublishing/LLM-Engineers-Handbook |
| Source | sagemaker_huggingface.py:L27-71, run.py:L16-35 |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Model_Deployment |
API Signature
SagemakerHuggingfaceStrategy(deployment_service).deploy(
role_arn,
llm_image,
config,
endpoint_name,
endpoint_config_name,
gpu_instance_type,
resources,
endpoint_type
)
Import
from llm_engineering.infrastructure.aws.deploy.huggingface.sagemaker_huggingface import (
SagemakerHuggingfaceStrategy,
DeploymentService,
)
Key Code
From run.py (the entry point for creating an endpoint):
def create_endpoint(endpoint_type=EndpointType.INFERENCE_COMPONENT_BASED) -> None:
llm_image = get_huggingface_llm_image_uri("huggingface", version="2.2.0")
resource_manager = ResourceManager()
deployment_service = DeploymentService(resource_manager=resource_manager)
SagemakerHuggingfaceStrategy(deployment_service).deploy(
role_arn=settings.AWS_ARN_ROLE,
llm_image=llm_image,
config=hugging_face_deploy_config,
endpoint_name=settings.SAGEMAKER_ENDPOINT_INFERENCE,
endpoint_config_name=settings.SAGEMAKER_ENDPOINT_CONFIG_INFERENCE,
gpu_instance_type=settings.GPU_INSTANCE_TYPE,
resources=model_resource_config,
endpoint_type=endpoint_type,
)
Parameters
| Parameter | Type | Description |
|---|---|---|
| role_arn | str | AWS IAM role ARN with SageMaker permissions |
| llm_image | str | URI of the HuggingFace TGI Docker image |
| config | dict | HuggingFace deployment configuration (model ID, quantization, etc.) |
| endpoint_name | str | Name for the SageMaker endpoint |
| endpoint_config_name | str | Name for the SageMaker endpoint configuration |
| gpu_instance_type | str | EC2 GPU instance type (e.g., ml.g5.2xlarge) |
| resources | dict | Resource configuration for the model (CPU, memory, GPU allocations) |
| endpoint_type | EndpointType | Deployment type (INFERENCE_COMPONENT_BASED or MODEL_BASED) |
Inputs and Outputs
Inputs:
- role_arn - AWS IAM role ARN for SageMaker execution
- llm_image - HuggingFace TGI container image URI
- config - Model deployment configuration dictionary
- endpoint_name - Target SageMaker endpoint name
- gpu_instance_type - GPU instance type for serving
Outputs:
- SageMaker endpoint deployed and ready for real-time inference
External Dependencies
- sagemaker - AWS SageMaker Python SDK for endpoint management
- boto3 - AWS SDK for low-level SageMaker API calls
- loguru - Structured logging
Source File
llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py(lines 27-71)llm_engineering/infrastructure/aws/deploy/huggingface/run.py(lines 16-35)
See Also
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment