Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tensorflow Serving Tfrt Saved Model Warmup

From Leeroopedia
Knowledge Sources
Domains Model Serving, Model Warmup
Last Updated 2026-02-13 00:00 GMT

Overview

Provides model warmup functionality for TFRT SavedModels by replaying recorded prediction logs at load time to trigger lazy initializations and improve first-request latency.

Description

The TFRT SavedModel Warmup module implements the RunSavedModelWarmup function that reads warmup data (recorded PredictionLog entries) from the model's export directory and replays them against the TFRT SavedModel. This triggers lazy initializations such as TensorFlow optimizations and XLA compilations at load time rather than at first-request time.

The internal RunWarmupRequest function dispatches each PredictionLog entry to the appropriate TFRT inference function based on its log type:

  • kPredictLog: Invokes RunPredict
  • kPredictStreamedLog: Invokes RunPredict with a streamed_output_callback (validates single request constraint)
  • kClassifyLog: Invokes RunClassify
  • kRegressLog: Invokes RunRegress
  • kMultiInferenceLog: Invokes RunMultiInference

An optimization is available via skip_warmup_requests_if_initialized: when set to true and all signature defs are already initialized (i.e., the count is at or below the lazy_init_threshold), non-MultiInference warmup requests are skipped. MultiInference requests are always executed because they trigger compilation for combinations of signature defs that would not be initialized during model loading.

The function delegates to internal::RunSavedModelWarmup from the shared warmup utility for file reading and iteration.

Usage

Use this module during model loading to pre-warm TFRT models. It is called by TfrtSavedModelFactory when the enable_model_warmup configuration flag is set. Warmup data should be placed in the model's assets.extra directory as TFRecord files containing serialized PredictionLog entries.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • Files:
    • tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h (lines 1-42)
    • tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.cc (lines 1-135)

Signature

Status RunSavedModelWarmup(const ModelWarmupOptions& model_warmup_options,
                           const string& export_dir, int lazy_init_threshold,
                           bool skip_warmup_requests_if_initialized,
                           tfrt::SavedModel* saved_model);

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_saved_model_warmup.h"

I/O Contract

Inputs

Name Type Required Description
model_warmup_options ModelWarmupOptions Yes Options including model name, version, and warmup configuration
export_dir string Yes Path to the SavedModel export directory containing warmup data
lazy_init_threshold int Yes Signature count threshold for lazy initialization; used with skip_warmup optimization
skip_warmup_requests_if_initialized bool Yes When true, skip non-MultiInference warmup requests if signatures are already initialized
saved_model tfrt::SavedModel* Yes The TFRT SavedModel to warm up

Outputs

Name Type Description
return Status OK if all warmup requests succeeded; error status with details on failure

Usage Examples

Warming Up a TFRT Model

ModelWarmupOptions warmup_options;
warmup_options.set_model_name("my_model");
warmup_options.set_model_version(1);

Status status = RunSavedModelWarmup(
    warmup_options, "/path/to/export",
    /*lazy_init_threshold=*/32,
    /*skip_warmup_requests_if_initialized=*/true,
    saved_model);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment