Implementation:Tensorflow Serving Tfrt Saved Model Warmup

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Model Warmup
Last Updated	2026-02-13 00:00 GMT

Overview

Provides model warmup functionality for TFRT SavedModels by replaying recorded prediction logs at load time to trigger lazy initializations and improve first-request latency.

Description

The TFRT SavedModel Warmup module implements the RunSavedModelWarmup function that reads warmup data (recorded PredictionLog entries) from the model's export directory and replays them against the TFRT SavedModel. This triggers lazy initializations such as TensorFlow optimizations and XLA compilations at load time rather than at first-request time.

The internal RunWarmupRequest function dispatches each PredictionLog entry to the appropriate TFRT inference function based on its log type:

kPredictLog: Invokes RunPredict
kPredictStreamedLog: Invokes RunPredict with a streamed_output_callback (validates single request constraint)
kClassifyLog: Invokes RunClassify
kRegressLog: Invokes RunRegress
kMultiInferenceLog: Invokes RunMultiInference

An optimization is available via skip_warmup_requests_if_initialized: when set to true and all signature defs are already initialized (i.e., the count is at or below the lazy_init_threshold), non-MultiInference warmup requests are skipped. MultiInference requests are always executed because they trigger compilation for combinations of signature defs that would not be initialized during model loading.

The function delegates to internal::RunSavedModelWarmup from the shared warmup utility for file reading and iteration.

Usage

Use this module during model loading to pre-warm TFRT models. It is called by TfrtSavedModelFactory when the enable_model_warmup configuration flag is set. Warmup data should be placed in the model's assets.extra directory as TFRecord files containing serialized PredictionLog entries.

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h (lines 1-42)
- tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.cc (lines 1-135)

Signature

Status RunSavedModelWarmup(const ModelWarmupOptions& model_warmup_options,
                           const string& export_dir, int lazy_init_threshold,
                           bool skip_warmup_requests_if_initialized,
                           tfrt::SavedModel* saved_model);

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h"

I/O Contract

Inputs

Name	Type	Required	Description
model_warmup_options	`ModelWarmupOptions`	Yes	Options including model name, version, and warmup configuration
export_dir	`string`	Yes	Path to the SavedModel export directory containing warmup data
lazy_init_threshold	`int`	Yes	Signature count threshold for lazy initialization; used with skip_warmup optimization
skip_warmup_requests_if_initialized	`bool`	Yes	When true, skip non-MultiInference warmup requests if signatures are already initialized
saved_model	`tfrt::SavedModel*`	Yes	The TFRT SavedModel to warm up

Outputs

Name	Type	Description
return	`Status`	OK if all warmup requests succeeded; error status with details on failure

Usage Examples

Warming Up a TFRT Model

ModelWarmupOptions warmup_options;
warmup_options.set_model_name("my_model");
warmup_options.set_model_version(1);

Status status = RunSavedModelWarmup(
    warmup_options, "/path/to/export",
    /*lazy_init_threshold=*/32,
    /*skip_warmup_requests_if_initialized=*/true,
    saved_model);

Related Pages

Principle:Tensorflow_Serving_TFRT_Model_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment