Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PacktPublishing LLM Engineers Handbook SagemakerHuggingfaceStrategy Deploy

From Leeroopedia


Field Value
Type API Doc
Workflow RAG_Inference
Repository PacktPublishing/LLM-Engineers-Handbook
Source sagemaker_huggingface.py:L27-71, run.py:L16-35
Implements Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Model_Deployment

API Signature

SagemakerHuggingfaceStrategy(deployment_service).deploy(
    role_arn,
    llm_image,
    config,
    endpoint_name,
    endpoint_config_name,
    gpu_instance_type,
    resources,
    endpoint_type
)

Import

from llm_engineering.infrastructure.aws.deploy.huggingface.sagemaker_huggingface import (
    SagemakerHuggingfaceStrategy,
    DeploymentService,
)

Key Code

From run.py (the entry point for creating an endpoint):

def create_endpoint(endpoint_type=EndpointType.INFERENCE_COMPONENT_BASED) -> None:
    llm_image = get_huggingface_llm_image_uri("huggingface", version="2.2.0")
    resource_manager = ResourceManager()
    deployment_service = DeploymentService(resource_manager=resource_manager)
    SagemakerHuggingfaceStrategy(deployment_service).deploy(
        role_arn=settings.AWS_ARN_ROLE,
        llm_image=llm_image,
        config=hugging_face_deploy_config,
        endpoint_name=settings.SAGEMAKER_ENDPOINT_INFERENCE,
        endpoint_config_name=settings.SAGEMAKER_ENDPOINT_CONFIG_INFERENCE,
        gpu_instance_type=settings.GPU_INSTANCE_TYPE,
        resources=model_resource_config,
        endpoint_type=endpoint_type,
    )

Parameters

Parameter Type Description
role_arn str AWS IAM role ARN with SageMaker permissions
llm_image str URI of the HuggingFace TGI Docker image
config dict HuggingFace deployment configuration (model ID, quantization, etc.)
endpoint_name str Name for the SageMaker endpoint
endpoint_config_name str Name for the SageMaker endpoint configuration
gpu_instance_type str EC2 GPU instance type (e.g., ml.g5.2xlarge)
resources dict Resource configuration for the model (CPU, memory, GPU allocations)
endpoint_type EndpointType Deployment type (INFERENCE_COMPONENT_BASED or MODEL_BASED)

Inputs and Outputs

Inputs:

  • role_arn - AWS IAM role ARN for SageMaker execution
  • llm_image - HuggingFace TGI container image URI
  • config - Model deployment configuration dictionary
  • endpoint_name - Target SageMaker endpoint name
  • gpu_instance_type - GPU instance type for serving

Outputs:

  • SageMaker endpoint deployed and ready for real-time inference

External Dependencies

  • sagemaker - AWS SageMaker Python SDK for endpoint management
  • boto3 - AWS SDK for low-level SageMaker API calls
  • loguru - Structured logging

Source File

  • llm_engineering/infrastructure/aws/deploy/huggingface/sagemaker_huggingface.py (lines 27-71)
  • llm_engineering/infrastructure/aws/deploy/huggingface/run.py (lines 16-35)

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment