Principle:Triton inference server Server Model Lifecycle Testing
Overview
Model Lifecycle Testing verifies the correctness of every stage in a model's lifecycle within Triton Inference Server: discovery in the model repository, configuration parsing, loading into memory, version management, readiness signaling, live reloading, and graceful unloading. This principle spans the broadest surface area of any QA grouping because the model lifecycle touches nearly every subsystem -- from filesystem monitoring and protobuf parsing through memory allocation and backend initialization to health endpoint reporting. A defect at any lifecycle stage can render a model unavailable, cause stale versions to serve traffic, or leak GPU memory on unload.
Theoretical Basis
The Model Repository Contract
Triton's model repository is a filesystem-based convention where each subdirectory represents a model, version subdirectories contain model artifacts, and a config.pbtxt file declares the model's interface and execution parameters. The lifecycle manager must correctly handle:
- Repository polling: When
--model-control-mode=pollis configured, the server periodically scans the repository for new, modified, or removed models. Testing must verify that polling detects changes within the configured interval, correctly distinguishes new models from modified versions of existing models, and does not spuriously reload unchanged models. - Explicit load/unload: When
--model-control-mode=explicitis configured, models are loaded and unloaded via the model control API. Testing must verify that the API correctly validates model names, returns appropriate errors for nonexistent models, and serializes concurrent load/unload requests for the same model. - Startup loading: At server startup, all requested models must be loaded before the server reports itself as ready. Testing must verify that a failure to load a critical model (when
--exit-on-error=true) prevents the server from becoming ready, while non-critical model failures are logged but do not block startup.
Version Management
Model versioning is a first-class concept in Triton. The version policy declared in config.pbtxt determines which versions are loaded:
- All versions: Every version subdirectory is loaded. Useful for A/B testing.
- Latest N: Only the N most recent versions (by directory name, interpreted as integers) are loaded.
- Specific versions: An explicit list of version numbers to load.
Testing must verify that version policies are correctly enforced, that adding a new version directory triggers loading of that version (and potentially unloading of an old version under "latest N"), and that requests specifying a version number are routed to the correct version.
Configuration Parsing Depth
The config.pbtxt file is a protobuf text-format representation of ModelConfig. Parsing must handle:
- Required vs. optional fields: Missing required fields (e.g.,
platformorbackend) must produce clear errors. Optional fields must use correct defaults. - Type coercion: Enum fields (e.g.,
data_type), integer fields, and boolean fields must be parsed with strict type checking. - Nested structures: Complex nested configurations like
dynamic_batching,sequence_batching,ensemble_scheduling, andmodel_warmupmust be fully validated. - JSON configuration: Triton also supports JSON-format model configuration as an alternative to protobuf text format. The JSON parser must produce identical
ModelConfigobjects as the pbtxt parser for equivalent inputs.
Load/Unload Atomicity and Safety
Model loading and unloading must be atomic from the perspective of inference requests:
- Loading: A model must not be marked as ready until all requested instances are fully initialized and capable of accepting inference requests. Premature readiness causes client errors.
- Unloading: In-flight requests to a model being unloaded must complete before the backend instance is destroyed. New requests must receive a "model not found" error immediately, not be queued behind the unload.
- Reload (update): When a model is updated, the old version must continue serving until the new version is fully loaded, then traffic must switch atomically. This zero-downtime reload is critical for production deployments.
Ensemble and BLS Dependency Graphs
Ensemble models create dependency graphs where one model's lifecycle depends on the availability of its component models. Testing must verify that ensemble loading fails gracefully when a component model is unavailable, that loading order does not matter (the lifecycle manager resolves dependencies), and that unloading a component model correctly cascades to dependent ensembles.
| Lifecycle Stage | Key Verification | Failure Impact |
|---|---|---|
| Discovery | Repository scan correctness | Models not found or stale |
| Config parsing | Protobuf/JSON fidelity | Wrong execution parameters |
| Loading | Backend initialization, memory allocation | Crash or hang at startup |
| Version management | Correct version policy enforcement | Wrong model version serving |
| Readiness | Health endpoint accuracy | Premature traffic, client errors |
| Unloading | In-flight request completion, memory release | GPU memory leaks, request drops |
Related Pages
Implementation:Triton_inference_server_Server_L0_Lifecycle_Test Implementation:Triton_inference_server_Server_L0_Model_Config_Test Implementation:Triton_inference_server_Server_L0_Config_Json_Test Triton_inference_server_Server