Principle:Pytorch Serve Model Artifact Configuration

Overview

Model Artifact Configuration is the principle of using declarative YAML configuration files to control model serving behavior in TorchServe. Rather than hardcoding serving parameters into handler code or passing them as command-line arguments, TorchServe allows each model archive to include a YAML configuration file that specifies worker counts, batching parameters, timeouts, device assignment, and torch.compile options. This configuration-as-code approach enables reproducible, version-controlled deployments.

Field	Value
Principle Name	Model Artifact Configuration
Workflow	Model_Deployment
Domains	Configuration, Model_Serving
Knowledge Sources	TorchServe
Last Updated	2026-02-13 00:00 GMT

Description

TorchServe supports model-level configuration through a YAML file that is bundled inside the model archive (.mar). This file is read at model load time and its values are propagated through the Context object to the handler, making configuration available at every stage of the inference pipeline.

Configuration Scope

The model YAML configuration can control:

Category	Configuration Keys	Description
Worker Management	`minWorkers`, `maxWorkers`	Number of worker processes allocated to this model
Batching	`batchSize`, `maxBatchDelay`	Batch aggregation size and maximum wait time (ms)
Timeouts	`responseTimeout`, `startupTimeout`	Worker response and startup timeout in seconds
Device	`deviceType`, `deviceIds`	Target device type and specific GPU IDs
torch.compile	`pt2.compile.enable`, `pt2.compile.backend`	Enable and configure `torch.compile()` optimization
torch.export	`pt2.export.aot_compile`	Enable AOT-compiled model loading
Handler	`handler` section (custom keys)	Arbitrary handler-specific configuration

Configuration Flow

The configuration flows through the system as follows:

The YAML file is included in the model archive via the --config-file flag of torch-model-archiver.
At model load time, Service.__init__ reads the manifest to locate the config file.
get_yaml_config() parses the YAML file into a Python dictionary.
The configuration dictionary is stored in Context.model_yaml_config.
The handler accesses it via self.model_yaml_config during initialize().

Declarative vs Imperative Configuration

The YAML-based approach offers several advantages over imperative configuration:

Reproducibility: The configuration is bundled with the model artifact, ensuring the same serving behavior across environments.
Version Control: YAML files can be tracked in source control alongside model code.
Separation of Concerns: Serving configuration is separated from model logic and infrastructure configuration.
Self-Documenting: The YAML file serves as documentation of the model's serving requirements.

Usage

Example YAML Configuration

# model_config.yaml

# Worker configuration
minWorkers: 1
maxWorkers: 4

# Batching configuration
batchSize: 8
maxBatchDelay: 200

# Timeout configuration
responseTimeout: 300
startupTimeout: 600

# Device configuration
deviceType: "gpu"
deviceIds: [0, 1]

# torch.compile configuration
pt2:
  compile:
    enable: true
    backend: "inductor"

# Custom handler configuration
handler:
  model_name: "bert-base-uncased"
  max_length: 512
  do_lower_case: true

Bundling Configuration with Model Archive

torch-model-archiver \
  --model-name my_model \
  --version 1.0 \
  --handler handler.py \
  --serialized-file model.pt \
  --config-file model_config.yaml \
  --export-path model_store/

Accessing Configuration in Handler

from ts.torch_handler.base_handler import BaseHandler


class MyHandler(BaseHandler):
    def initialize(self, context):
        super().initialize(context)
        # Access model-level YAML config
        config = context.model_yaml_config
        self.max_length = config.get("handler", {}).get("max_length", 128)
        self.do_lower_case = config.get("handler", {}).get("do_lower_case", True)

Theoretical Basis

Configuration as Code

The model YAML configuration embodies the Configuration as Code principle from DevOps practices. By treating configuration as a versioned artifact bundled with the deployment unit (the .mar file), TorchServe ensures that:

Configuration drift between environments is eliminated.
Deployments are atomic: the model, handler, and configuration travel together.
Rollbacks restore not just the model weights but also the serving configuration.

Declarative Configuration

The YAML-based approach follows the Declarative Configuration paradigm, where the desired state of the system is declared rather than the steps to achieve it. The serving infrastructure reads the declared state (worker count, batch size, compile options) and converges the runtime to match. This is conceptually similar to Kubernetes manifests or Terraform configurations.

Inversion of Control

Configuration is not fetched by the handler; it is injected by the framework through the Context object. This Inversion of Control pattern decouples the handler from the configuration source, allowing the same handler to behave differently based on different YAML configurations without code changes.

Related Pages

Implementation:Pytorch_Serve_Get_Yaml_Config - The get_yaml_config() function that reads YAML configuration files
Principle:Pytorch_Serve_Inference_Handler_Development - Handlers consume YAML configuration during initialization
Principle:Pytorch_Serve_Model_Archiving - Configuration files are bundled into model archives
Principle:Pytorch_Serve_Server_Lifecycle - Server-level configuration complements model-level YAML config

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment