Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Pytorch Serve Model Artifact Configuration

From Leeroopedia

Overview

Model Artifact Configuration is the principle of using declarative YAML configuration files to control model serving behavior in TorchServe. Rather than hardcoding serving parameters into handler code or passing them as command-line arguments, TorchServe allows each model archive to include a YAML configuration file that specifies worker counts, batching parameters, timeouts, device assignment, and torch.compile options. This configuration-as-code approach enables reproducible, version-controlled deployments.

Field Value
Principle Name Model Artifact Configuration
Workflow Model_Deployment
Domains Configuration, Model_Serving
Knowledge Sources TorchServe
Last Updated 2026-02-13 00:00 GMT

Description

TorchServe supports model-level configuration through a YAML file that is bundled inside the model archive (.mar). This file is read at model load time and its values are propagated through the Context object to the handler, making configuration available at every stage of the inference pipeline.

Configuration Scope

The model YAML configuration can control:

Category Configuration Keys Description
Worker Management minWorkers, maxWorkers Number of worker processes allocated to this model
Batching batchSize, maxBatchDelay Batch aggregation size and maximum wait time (ms)
Timeouts responseTimeout, startupTimeout Worker response and startup timeout in seconds
Device deviceType, deviceIds Target device type and specific GPU IDs
torch.compile pt2.compile.enable, pt2.compile.backend Enable and configure torch.compile() optimization
torch.export pt2.export.aot_compile Enable AOT-compiled model loading
Handler handler section (custom keys) Arbitrary handler-specific configuration

Configuration Flow

The configuration flows through the system as follows:

  1. The YAML file is included in the model archive via the --config-file flag of torch-model-archiver.
  2. At model load time, Service.__init__ reads the manifest to locate the config file.
  3. get_yaml_config() parses the YAML file into a Python dictionary.
  4. The configuration dictionary is stored in Context.model_yaml_config.
  5. The handler accesses it via self.model_yaml_config during initialize().

Declarative vs Imperative Configuration

The YAML-based approach offers several advantages over imperative configuration:

  • Reproducibility: The configuration is bundled with the model artifact, ensuring the same serving behavior across environments.
  • Version Control: YAML files can be tracked in source control alongside model code.
  • Separation of Concerns: Serving configuration is separated from model logic and infrastructure configuration.
  • Self-Documenting: The YAML file serves as documentation of the model's serving requirements.

Usage

Example YAML Configuration

# model_config.yaml

# Worker configuration
minWorkers: 1
maxWorkers: 4

# Batching configuration
batchSize: 8
maxBatchDelay: 200

# Timeout configuration
responseTimeout: 300
startupTimeout: 600

# Device configuration
deviceType: "gpu"
deviceIds: [0, 1]

# torch.compile configuration
pt2:
  compile:
    enable: true
    backend: "inductor"

# Custom handler configuration
handler:
  model_name: "bert-base-uncased"
  max_length: 512
  do_lower_case: true

Bundling Configuration with Model Archive

torch-model-archiver \
  --model-name my_model \
  --version 1.0 \
  --handler handler.py \
  --serialized-file model.pt \
  --config-file model_config.yaml \
  --export-path model_store/

Accessing Configuration in Handler

from ts.torch_handler.base_handler import BaseHandler


class MyHandler(BaseHandler):
    def initialize(self, context):
        super().initialize(context)
        # Access model-level YAML config
        config = context.model_yaml_config
        self.max_length = config.get("handler", {}).get("max_length", 128)
        self.do_lower_case = config.get("handler", {}).get("do_lower_case", True)

Theoretical Basis

Configuration as Code

The model YAML configuration embodies the Configuration as Code principle from DevOps practices. By treating configuration as a versioned artifact bundled with the deployment unit (the .mar file), TorchServe ensures that:

  • Configuration drift between environments is eliminated.
  • Deployments are atomic: the model, handler, and configuration travel together.
  • Rollbacks restore not just the model weights but also the serving configuration.

Declarative Configuration

The YAML-based approach follows the Declarative Configuration paradigm, where the desired state of the system is declared rather than the steps to achieve it. The serving infrastructure reads the declared state (worker count, batch size, compile options) and converges the runtime to match. This is conceptually similar to Kubernetes manifests or Terraform configurations.

Inversion of Control

Configuration is not fetched by the handler; it is injected by the framework through the Context object. This Inversion of Control pattern decouples the handler from the configuration source, allowing the same handler to behave differently based on different YAML configurations without code changes.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment