Implementation:Tensorflow Serving Tfrt Predict Util
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Prediction |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Implements TFRT-based prediction with support for output filtering, tensor serialization options, and custom thread pool configuration.
Description
The TFRT Predict Util module provides the primary predict execution path for TFRT SavedModels. It exposes two versions of RunPredict: an internal version that accepts a PredictResponseTensorSerializationOption (kAsProtoField or kAsProtoContent) and a public version that defaults to kAsProtoField for backward compatibility.
The implementation takes two distinct code paths based on whether output filtering is needed. When no output filter is specified (or the filter matches all outputs), it uses the optimized PreProcessPredictionWithoutOutputFilter path that validates inputs against function metadata, handles default input values, checks data types, and invokes the model via SavedModel::Run. When an output filter is specified that selects a subset of outputs, it falls back to RunByTensorNames which uses the MetaGraphDef signature definitions for tensor name resolution, enabling lazy initialization of optimized subgraphs.
The module also supports custom thread pool options via TfThreadPoolWorkQueue when an inter-op thread pool is provided, and records runtime latency metrics for monitoring. Post-processing serializes output tensors into the PredictResponse using the specified serialization option, applying the output filter when present.
Usage
Use this module for predict requests through the TFRT runtime. It is the core predict function called by TfrtSavedModelServable::Predict and during model warmup. The internal variant is used when tensor serialization format control is needed (e.g., kAsProtoContent for bandwidth optimization).
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tfrt_predict_util.h(lines 1-57)tensorflow_serving/servables/tensorflow/tfrt_predict_util.cc(lines 1-283)
Signature
namespace internal {
Status RunPredict(
const tfrt::SavedModel::RunOptions& run_options,
const absl::optional<int64_t>& servable_version,
const PredictResponseTensorSerializationOption tensor_serialization_option,
tfrt::SavedModel* saved_model, const PredictRequest& request,
PredictResponse* response,
const thread::ThreadPoolOptions& thread_pool_options =
thread::ThreadPoolOptions());
} // namespace internal
Status RunPredict(const tfrt::SavedModel::RunOptions& run_options,
const absl::optional<int64_t>& servable_version,
tfrt::SavedModel* saved_model, const PredictRequest& request,
PredictResponse* response,
const thread::ThreadPoolOptions& thread_pool_options =
thread::ThreadPoolOptions());
Import
#include "tensorflow_serving/servables/tensorflow/tfrt_predict_util.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| run_options | tfrt::SavedModel::RunOptions |
Yes | Runtime options including deadline and validation settings |
| servable_version | absl::optional<int64_t> |
No | Version to set in the response ModelSpec |
| saved_model | tfrt::SavedModel* |
Yes | Loaded TFRT SavedModel |
| request | PredictRequest |
Yes | Predict request with model_spec, input tensors map, and optional output_filter |
| thread_pool_options | thread::ThreadPoolOptions |
No | Optional custom thread pools for inter/intra-op parallelism |
Outputs
| Name | Type | Description |
|---|---|---|
| response | PredictResponse* |
Populated response with model_spec and output tensors map |
| return | Status |
OK on success; FailedPrecondition if function not found; InvalidArgument for missing/mistyped inputs or invalid output_filter |
Usage Examples
Basic Predict Call
tfrt::SavedModel::RunOptions run_options;
PredictRequest request;
request.mutable_model_spec()->set_name("my_model");
(*request.mutable_inputs())["input"].CopyFrom(input_tensor_proto);
PredictResponse response;
Status status = RunPredict(run_options, /*servable_version=*/1,
saved_model, request, &response);