Implementation:Tensorflow Serving RunSavedModelWarmup
| Knowledge Sources | |
|---|---|
| Domains | Performance, Deployment |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete tool for executing warmup inference requests during model loading to reduce first-request latency, provided by the saved_model_warmup module.
Description
RunSavedModelWarmup() reads the warmup TFRecord file from the SavedModel's assets.extra/ directory and executes each request against the loaded session:
- Reads up to WarmupConsts::kMaxNumRecords (1000) records from the TFRecord file
- Parses each record as a PredictionLog proto
- Dispatches to RunWarmupRequest() which handles Predict, Classify, Regress, and MultiInference log types
- Optionally runs with multiple threads (num_model_warmup_threads)
- Optionally warms up at all allowed batch sizes (enable_all_batch_sizes_warmup)
Returns OK status even if no warmup file is found (warmup is optional).
Usage
Called automatically during model loading when --enable_model_warmup=true (the default). Users need only provide the warmup file in their SavedModel export.
Code Reference
Source Location
- Repository: tensorflow/serving
- File: tensorflow_serving/servables/tensorflow/saved_model_warmup.cc L82-91
- Internal: tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc L71-255
- Dispatcher: tensorflow_serving/servables/tensorflow/saved_model_warmup.cc L39-78
- Header: tensorflow_serving/servables/tensorflow/saved_model_warmup.h L34-36
Signature
Status RunSavedModelWarmup(
const ModelWarmupOptions& model_warmup_options,
const RunOptions& run_options,
const string& export_dir,
SavedModelBundle* bundle
);
Import
#include "tensorflow_serving/servables/tensorflow/saved_model_warmup.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_warmup_options | ModelWarmupOptions | Yes | Warmup configuration (iterations, threads, batch sizes) |
| run_options | RunOptions | Yes | TensorFlow session run options |
| export_dir | string | Yes | Path to SavedModel directory |
| bundle | SavedModelBundle* | Yes | Loaded model with session and metagraph |
Outputs
| Name | Type | Description |
|---|---|---|
| Status | Status | OK on success (also OK if no warmup file found) |
| Warmed-up session | side effect | Session with pre-allocated resources and compiled kernels |
| /tensorflow/serving/model_warmup_latency | metric | Warmup latency recorded in monitoring |
Usage Examples
Creating Warmup File (Python)
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
# Create warmup file
with tf.io.TFRecordWriter("assets.extra/tf_serving_warmup_requests") as writer:
# Create sample predict request
request = predict_pb2.PredictRequest()
request.model_spec.name = 'my_model'
request.model_spec.signature_name = 'serving_default'
request.inputs['input'].CopyFrom(
tf.make_tensor_proto([1.0, 2.0, 3.0], shape=[1, 3])
)
# Wrap in PredictionLog
log = prediction_log_pb2.PredictionLog(
predict_log=prediction_log_pb2.PredictLog(request=request)
)
writer.write(log.SerializeToString())
Using ResNet Warmup Generator
# Generate warmup file for ResNet model
python tensorflow_serving/example/resnet_warmup.py \
--output_dir=/tmp/resnet/1/assets.extra