Implementation:Tensorflow Serving Tfrt Saved Model Warmup
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Model Warmup |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Provides model warmup functionality for TFRT SavedModels by replaying recorded prediction logs at load time to trigger lazy initializations and improve first-request latency.
Description
The TFRT SavedModel Warmup module implements the RunSavedModelWarmup function that reads warmup data (recorded PredictionLog entries) from the model's export directory and replays them against the TFRT SavedModel. This triggers lazy initializations such as TensorFlow optimizations and XLA compilations at load time rather than at first-request time.
The internal RunWarmupRequest function dispatches each PredictionLog entry to the appropriate TFRT inference function based on its log type:
- kPredictLog: Invokes RunPredict
- kPredictStreamedLog: Invokes RunPredict with a streamed_output_callback (validates single request constraint)
- kClassifyLog: Invokes RunClassify
- kRegressLog: Invokes RunRegress
- kMultiInferenceLog: Invokes RunMultiInference
An optimization is available via skip_warmup_requests_if_initialized: when set to true and all signature defs are already initialized (i.e., the count is at or below the lazy_init_threshold), non-MultiInference warmup requests are skipped. MultiInference requests are always executed because they trigger compilation for combinations of signature defs that would not be initialized during model loading.
The function delegates to internal::RunSavedModelWarmup from the shared warmup utility for file reading and iteration.
Usage
Use this module during model loading to pre-warm TFRT models. It is called by TfrtSavedModelFactory when the enable_model_warmup configuration flag is set. Warmup data should be placed in the model's assets.extra directory as TFRecord files containing serialized PredictionLog entries.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h(lines 1-42)tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.cc(lines 1-135)
Signature
Status RunSavedModelWarmup(const ModelWarmupOptions& model_warmup_options,
const string& export_dir, int lazy_init_threshold,
bool skip_warmup_requests_if_initialized,
tfrt::SavedModel* saved_model);
Import
#include "tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_warmup_options | ModelWarmupOptions |
Yes | Options including model name, version, and warmup configuration |
| export_dir | string |
Yes | Path to the SavedModel export directory containing warmup data |
| lazy_init_threshold | int |
Yes | Signature count threshold for lazy initialization; used with skip_warmup optimization |
| skip_warmup_requests_if_initialized | bool |
Yes | When true, skip non-MultiInference warmup requests if signatures are already initialized |
| saved_model | tfrt::SavedModel* |
Yes | The TFRT SavedModel to warm up |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Status |
OK if all warmup requests succeeded; error status with details on failure |
Usage Examples
Warming Up a TFRT Model
ModelWarmupOptions warmup_options;
warmup_options.set_model_name("my_model");
warmup_options.set_model_version(1);
Status status = RunSavedModelWarmup(
warmup_options, "/path/to/export",
/*lazy_init_threshold=*/32,
/*skip_warmup_requests_if_initialized=*/true,
saved_model);