Implementation:PacktPublishing LLM Engineers Handbook HuggingFaceProcessor Run
Appearance
Overview
HuggingFaceProcessor Run implements the Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Evaluation_Orchestration principle by wrapping the SageMaker HuggingFaceProcessor API to launch a GPU-backed processing job that executes the evaluation script remotely.
| Aspect | Detail |
|---|---|
| Implementation Name | HuggingFaceProcessor Run |
| Workflow | Model_Evaluation |
| Type | Wrapper Doc (SageMaker) |
| Source File | llm_engineering/model/evaluation/sagemaker.py (Lines 17–57) |
| Implements | Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Evaluation_Orchestration |
API Signature
def run_evaluation_on_sagemaker(is_dummy: bool = True) -> None
Internally creates and invokes:
HuggingFaceProcessor(role, instance_count, instance_type, ...).run(code, source_dir)
Key Code
def run_evaluation_on_sagemaker(is_dummy: bool = True) -> None:
hfp = HuggingFaceProcessor(
role=settings.AWS_ARN_ROLE,
instance_count=1,
instance_type="ml.g5.2xlarge",
transformers_version="4.36",
pytorch_version="2.1",
py_version="py310",
base_job_name="llm-twin-evaluation",
env={
"HUGGING_FACE_HUB_TOKEN": settings.HUGGING_FACE_HUB_TOKEN,
"OPENAI_API_KEY": settings.OPENAI_API_KEY,
"MODEL_HUGGINGFACE_WORKSPACE": settings.MODEL_HUGGINGFACE_WORKSPACE,
"IS_DUMMY": str(is_dummy),
},
)
hfp.run(
code="evaluate.py",
source_dir=str(current_dir),
)
Imports
from sagemaker.huggingface import HuggingFaceProcessor
Inputs
| Parameter | Type | Description |
|---|---|---|
is_dummy |
bool |
When True, runs evaluation in dummy/lightweight mode for testing. Defaults to True.
|
| AWS credentials | Environment | IAM role ARN from settings.AWS_ARN_ROLE
|
| HuggingFace token | Environment | Hub access token from settings.HUGGING_FACE_HUB_TOKEN
|
| OpenAI API key | Environment | API key for LLM-as-Judge scoring from settings.OPENAI_API_KEY
|
Outputs
A SageMaker processing job is launched that:
- Provisions an
ml.g5.2xlargeGPU instance - Installs the HuggingFace Transformers 4.36 + PyTorch 2.1 environment
- Uploads and executes
evaluate.pyfrom the local source directory - Passes all environment variables into the container
- Runs the full evaluation pipeline (inference + scoring + aggregation) on the remote instance
The function returns None; results are persisted to HuggingFace Hub by the evaluation script itself.
Configuration Details
| Setting | Value | Purpose |
|---|---|---|
instance_type |
ml.g5.2xlarge |
NVIDIA A10G GPU with 24 GB VRAM, sufficient for 7B parameter models |
instance_count |
1 |
Single instance; evaluation is not distributed |
transformers_version |
4.36 |
Matches the version used during training |
pytorch_version |
2.1 |
Matches the version used during training |
py_version |
py310 |
Python 3.10 runtime |
base_job_name |
llm-twin-evaluation |
Prefix for the SageMaker job name in the AWS console |
External Dependencies
| Dependency | Purpose |
|---|---|
sagemaker |
AWS SageMaker Python SDK for creating and managing processing jobs |
huggingface_hub |
Accessed within the remote evaluation script for model/dataset operations |
loguru |
Structured logging within the orchestration function |
External Reference
How It Works
- The function is called with an
is_dummyflag indicating whether to run a lightweight evaluation - A
HuggingFaceProcessorinstance is created with the specified IAM role, instance configuration, and framework versions - All required environment variables (API keys, workspace identifier, dummy flag) are injected via the
envparameter - The
.run()method uploads the entiresource_dir(containingevaluate.pyand supporting modules) to S3 - SageMaker provisions the specified GPU instance, pulls the HuggingFace DLC (Deep Learning Container), and executes
evaluate.py - The evaluation script runs the full pipeline: model validation, batch inference, LLM-as-Judge scoring, and results aggregation
- Upon completion, the instance is automatically terminated
See Also
- Principle:PacktPublishing_LLM_Engineers_Handbook_SageMaker_Evaluation_Orchestration — the principle this implements
- Implementation:PacktPublishing_LLM_Engineers_Handbook_VLLM_LLM_Generate — the batch inference step executed within the processing job
- Implementation:PacktPublishing_LLM_Engineers_Handbook_OpenAI_Chat_Completions — the LLM-as-Judge step executed within the processing job
- Environment:PacktPublishing_LLM_Engineers_Handbook_AWS_SageMaker_GPU_Environment
- Environment:PacktPublishing_LLM_Engineers_Handbook_API_Credentials
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment