Workflow:Tensorflow Serving Model Version Management

Knowledge Sources	TensorFlow Serving Serving Configuration Architecture Overview
Domains	ML_Ops, Model_Serving, Version_Control
Last Updated	2026-02-13 17:00 GMT

Overview

End-to-end process for managing multiple model versions with TensorFlow Serving, including version policies, canary deployments, label-based routing, and dynamic configuration updates.

Description

This workflow covers the complete model version lifecycle management system in TensorFlow Serving. The system automatically discovers new model versions on disk via FileSystemStoragePathSource, loads them through the Source-Adapter-Manager pipeline, and applies version policies (Availability Preserving or Resource Preserving) to control transitions. It supports serving multiple versions simultaneously for A/B testing and canary deployments, assigning string labels (e.g., stable, canary) to versions, and dynamically reloading server configuration at runtime.

Usage

Execute this workflow when you need to manage model lifecycle transitions in a production environment: deploying updated models without downtime, performing canary releases, rolling back to known-good versions, or serving multiple model versions concurrently for experimentation.

Execution Steps

Step 1: Export Multiple Model Versions

Train and export successive versions of the model to the same base directory, each in a version-numbered subdirectory. TensorFlow Serving uses these numeric directory names to identify and order versions. Newer versions should have larger version numbers.

Key considerations:

Each version lives in a separate subdirectory (e.g., /models/mnist/1, /models/mnist/2)
Version numbers are positive integers parsed from directory names
The FileSystemStoragePathSource polls the filesystem at a configurable interval to detect new versions
Multiple models can each have independent version streams

Step 2: Configure Version Policy

Select and configure a version policy that controls how the server transitions between model versions. The two built-in policies are Availability Preserving (loads new version before unloading old, ensuring zero downtime) and Resource Preserving (unloads old version before loading new, minimizing peak resource usage).

Key considerations:

Availability Preserving is the default and recommended for most production deployments
Resource Preserving is useful when memory constraints prevent loading two versions simultaneously
Policies are configured via the model_version_policy field in ModelServerConfig
To serve specific versions, use the specific policy with explicit version numbers

Step 3: Assign Version Labels

Map human-readable labels (such as stable and canary) to specific version numbers. This creates an indirection layer that allows clients to request models by label rather than version number, enabling transparent version swaps without client changes.

Key considerations:

Labels can only be assigned to versions that are already loaded and available
Use --allow_version_labels_for_unavailable_models at startup if labels must be pre-assigned
Reassigning a label to a new version requires the new version to be loaded first
Clients reference labels via the version_label field in ModelSpec or /labels/ in REST URLs

Step 4: Deploy Canary Configuration

Update the server configuration to serve both the current stable version and the new canary version simultaneously. Route a portion of traffic to the canary version via label-based routing. Monitor canary performance metrics before promoting.

Key considerations:

Both versions must be listed in the model_version_policy specific versions list
Assign stable label to the current production version and canary to the new version
Client-side logic (e.g., user ID hashing) determines which label to request
Monitor error rates, latency, and prediction quality before promoting the canary

Step 5: Reload Configuration Dynamically

Update the running server's configuration without restart, either by modifying the config file on disk (with periodic polling enabled) or by issuing a HandleReloadConfigRequest RPC. This triggers the server to load new models/versions and unload removed ones.

Key considerations:

Enable --model_config_file_poll_wait_seconds for automatic config file monitoring
HandleReloadConfigRequest RPC allows programmatic config updates
The server realizes only the content of the new config; models not in the new config are unloaded
Label reassignment and version promotion can be done via config reload

Step 6: Promote or Rollback

Based on canary results, either promote the canary to stable by updating labels, or rollback by reverting the configuration. After promotion, optionally unload the old version to free resources.

Key considerations:

To promote: update the stable label to point to the canary version number
To rollback: revert to the previous configuration file
Unload old versions by removing them from the model_version_policy
The AspiredVersionsManager handles the actual load/unload state transitions

Execution Diagram

GitHub URL

Workflow Repository