Principle:Pytorch Serve Model Artifact Configuration
Overview
Model Artifact Configuration is the principle of using declarative YAML configuration files to control model serving behavior in TorchServe. Rather than hardcoding serving parameters into handler code or passing them as command-line arguments, TorchServe allows each model archive to include a YAML configuration file that specifies worker counts, batching parameters, timeouts, device assignment, and torch.compile options. This configuration-as-code approach enables reproducible, version-controlled deployments.
| Field | Value |
|---|---|
| Principle Name | Model Artifact Configuration |
| Workflow | Model_Deployment |
| Domains | Configuration, Model_Serving |
| Knowledge Sources | TorchServe |
| Last Updated | 2026-02-13 00:00 GMT |
Description
TorchServe supports model-level configuration through a YAML file that is bundled inside the model archive (.mar). This file is read at model load time and its values are propagated through the Context object to the handler, making configuration available at every stage of the inference pipeline.
Configuration Scope
The model YAML configuration can control:
| Category | Configuration Keys | Description |
|---|---|---|
| Worker Management | minWorkers, maxWorkers |
Number of worker processes allocated to this model |
| Batching | batchSize, maxBatchDelay |
Batch aggregation size and maximum wait time (ms) |
| Timeouts | responseTimeout, startupTimeout |
Worker response and startup timeout in seconds |
| Device | deviceType, deviceIds |
Target device type and specific GPU IDs |
| torch.compile | pt2.compile.enable, pt2.compile.backend |
Enable and configure torch.compile() optimization
|
| torch.export | pt2.export.aot_compile |
Enable AOT-compiled model loading |
| Handler | handler section (custom keys) |
Arbitrary handler-specific configuration |
Configuration Flow
The configuration flows through the system as follows:
- The YAML file is included in the model archive via the
--config-fileflag oftorch-model-archiver. - At model load time,
Service.__init__reads the manifest to locate the config file. get_yaml_config()parses the YAML file into a Python dictionary.- The configuration dictionary is stored in
Context.model_yaml_config. - The handler accesses it via
self.model_yaml_configduringinitialize().
Declarative vs Imperative Configuration
The YAML-based approach offers several advantages over imperative configuration:
- Reproducibility: The configuration is bundled with the model artifact, ensuring the same serving behavior across environments.
- Version Control: YAML files can be tracked in source control alongside model code.
- Separation of Concerns: Serving configuration is separated from model logic and infrastructure configuration.
- Self-Documenting: The YAML file serves as documentation of the model's serving requirements.
Usage
Example YAML Configuration
# model_config.yaml
# Worker configuration
minWorkers: 1
maxWorkers: 4
# Batching configuration
batchSize: 8
maxBatchDelay: 200
# Timeout configuration
responseTimeout: 300
startupTimeout: 600
# Device configuration
deviceType: "gpu"
deviceIds: [0, 1]
# torch.compile configuration
pt2:
compile:
enable: true
backend: "inductor"
# Custom handler configuration
handler:
model_name: "bert-base-uncased"
max_length: 512
do_lower_case: true
Bundling Configuration with Model Archive
torch-model-archiver \
--model-name my_model \
--version 1.0 \
--handler handler.py \
--serialized-file model.pt \
--config-file model_config.yaml \
--export-path model_store/
Accessing Configuration in Handler
from ts.torch_handler.base_handler import BaseHandler
class MyHandler(BaseHandler):
def initialize(self, context):
super().initialize(context)
# Access model-level YAML config
config = context.model_yaml_config
self.max_length = config.get("handler", {}).get("max_length", 128)
self.do_lower_case = config.get("handler", {}).get("do_lower_case", True)
Theoretical Basis
Configuration as Code
The model YAML configuration embodies the Configuration as Code principle from DevOps practices. By treating configuration as a versioned artifact bundled with the deployment unit (the .mar file), TorchServe ensures that:
- Configuration drift between environments is eliminated.
- Deployments are atomic: the model, handler, and configuration travel together.
- Rollbacks restore not just the model weights but also the serving configuration.
Declarative Configuration
The YAML-based approach follows the Declarative Configuration paradigm, where the desired state of the system is declared rather than the steps to achieve it. The serving infrastructure reads the declared state (worker count, batch size, compile options) and converges the runtime to match. This is conceptually similar to Kubernetes manifests or Terraform configurations.
Inversion of Control
Configuration is not fetched by the handler; it is injected by the framework through the Context object. This Inversion of Control pattern decouples the handler from the configuration source, allowing the same handler to behave differently based on different YAML configurations without code changes.
Related Pages
- Implementation:Pytorch_Serve_Get_Yaml_Config - The
get_yaml_config()function that reads YAML configuration files - Principle:Pytorch_Serve_Inference_Handler_Development - Handlers consume YAML configuration during initialization
- Principle:Pytorch_Serve_Model_Archiving - Configuration files are bundled into model archives
- Principle:Pytorch_Serve_Server_Lifecycle - Server-level configuration complements model-level YAML config