Workflow:Tensorflow Serving Model Version Management
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Model_Serving, Version_Control |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
End-to-end process for managing multiple model versions with TensorFlow Serving, including version policies, canary deployments, label-based routing, and dynamic configuration updates.
Description
This workflow covers the complete model version lifecycle management system in TensorFlow Serving. The system automatically discovers new model versions on disk via FileSystemStoragePathSource, loads them through the Source-Adapter-Manager pipeline, and applies version policies (Availability Preserving or Resource Preserving) to control transitions. It supports serving multiple versions simultaneously for A/B testing and canary deployments, assigning string labels (e.g., stable, canary) to versions, and dynamically reloading server configuration at runtime.
Usage
Execute this workflow when you need to manage model lifecycle transitions in a production environment: deploying updated models without downtime, performing canary releases, rolling back to known-good versions, or serving multiple model versions concurrently for experimentation.
Execution Steps
Step 1: Export Multiple Model Versions
Train and export successive versions of the model to the same base directory, each in a version-numbered subdirectory. TensorFlow Serving uses these numeric directory names to identify and order versions. Newer versions should have larger version numbers.
Key considerations:
- Each version lives in a separate subdirectory (e.g., /models/mnist/1, /models/mnist/2)
- Version numbers are positive integers parsed from directory names
- The FileSystemStoragePathSource polls the filesystem at a configurable interval to detect new versions
- Multiple models can each have independent version streams
Step 2: Configure Version Policy
Select and configure a version policy that controls how the server transitions between model versions. The two built-in policies are Availability Preserving (loads new version before unloading old, ensuring zero downtime) and Resource Preserving (unloads old version before loading new, minimizing peak resource usage).
Key considerations:
- Availability Preserving is the default and recommended for most production deployments
- Resource Preserving is useful when memory constraints prevent loading two versions simultaneously
- Policies are configured via the model_version_policy field in ModelServerConfig
- To serve specific versions, use the specific policy with explicit version numbers
Step 3: Assign Version Labels
Map human-readable labels (such as stable and canary) to specific version numbers. This creates an indirection layer that allows clients to request models by label rather than version number, enabling transparent version swaps without client changes.
Key considerations:
- Labels can only be assigned to versions that are already loaded and available
- Use --allow_version_labels_for_unavailable_models at startup if labels must be pre-assigned
- Reassigning a label to a new version requires the new version to be loaded first
- Clients reference labels via the version_label field in ModelSpec or /labels/ in REST URLs
Step 4: Deploy Canary Configuration
Update the server configuration to serve both the current stable version and the new canary version simultaneously. Route a portion of traffic to the canary version via label-based routing. Monitor canary performance metrics before promoting.
Key considerations:
- Both versions must be listed in the model_version_policy specific versions list
- Assign stable label to the current production version and canary to the new version
- Client-side logic (e.g., user ID hashing) determines which label to request
- Monitor error rates, latency, and prediction quality before promoting the canary
Step 5: Reload Configuration Dynamically
Update the running server's configuration without restart, either by modifying the config file on disk (with periodic polling enabled) or by issuing a HandleReloadConfigRequest RPC. This triggers the server to load new models/versions and unload removed ones.
Key considerations:
- Enable --model_config_file_poll_wait_seconds for automatic config file monitoring
- HandleReloadConfigRequest RPC allows programmatic config updates
- The server realizes only the content of the new config; models not in the new config are unloaded
- Label reassignment and version promotion can be done via config reload
Step 6: Promote or Rollback
Based on canary results, either promote the canary to stable by updating labels, or rollback by reverting the configuration. After promotion, optionally unload the old version to free resources.
Key considerations:
- To promote: update the stable label to point to the canary version number
- To rollback: revert to the previous configuration file
- Unload old versions by removing them from the model_version_policy
- The AspiredVersionsManager handles the actual load/unload state transitions