Implementation:Tensorflow Serving Tfrt Predict Util

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Prediction
Last Updated	2026-02-13 00:00 GMT

Overview

Implements TFRT-based prediction with support for output filtering, tensor serialization options, and custom thread pool configuration.

Description

The TFRT Predict Util module provides the primary predict execution path for TFRT SavedModels. It exposes two versions of RunPredict: an internal version that accepts a PredictResponseTensorSerializationOption (kAsProtoField or kAsProtoContent) and a public version that defaults to kAsProtoField for backward compatibility.

The implementation takes two distinct code paths based on whether output filtering is needed. When no output filter is specified (or the filter matches all outputs), it uses the optimized PreProcessPredictionWithoutOutputFilter path that validates inputs against function metadata, handles default input values, checks data types, and invokes the model via SavedModel::Run. When an output filter is specified that selects a subset of outputs, it falls back to RunByTensorNames which uses the MetaGraphDef signature definitions for tensor name resolution, enabling lazy initialization of optimized subgraphs.

The module also supports custom thread pool options via TfThreadPoolWorkQueue when an inter-op thread pool is provided, and records runtime latency metrics for monitoring. Post-processing serializes output tensors into the PredictResponse using the specified serialization option, applying the output filter when present.

Usage

Use this module for predict requests through the TFRT runtime. It is the core predict function called by TfrtSavedModelServable::Predict and during model warmup. The internal variant is used when tensor serialization format control is needed (e.g., kAsProtoContent for bandwidth optimization).

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/tfrt_predict_util.h (lines 1-57)
- tensorflow_serving/servables/tensorflow/tfrt_predict_util.cc (lines 1-283)

Signature

namespace internal {
Status RunPredict(
    const tfrt::SavedModel::RunOptions& run_options,
    const absl::optional<int64_t>& servable_version,
    const PredictResponseTensorSerializationOption tensor_serialization_option,
    tfrt::SavedModel* saved_model, const PredictRequest& request,
    PredictResponse* response,
    const thread::ThreadPoolOptions& thread_pool_options =
        thread::ThreadPoolOptions());
}  // namespace internal

Status RunPredict(const tfrt::SavedModel::RunOptions& run_options,
                  const absl::optional<int64_t>& servable_version,
                  tfrt::SavedModel* saved_model, const PredictRequest& request,
                  PredictResponse* response,
                  const thread::ThreadPoolOptions& thread_pool_options =
                      thread::ThreadPoolOptions());

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_predict_util.h"

I/O Contract

Inputs

Name	Type	Required	Description
run_options	`tfrt::SavedModel::RunOptions`	Yes	Runtime options including deadline and validation settings
servable_version	`absl::optional<int64_t>`	No	Version to set in the response ModelSpec
saved_model	`tfrt::SavedModel*`	Yes	Loaded TFRT SavedModel
request	`PredictRequest`	Yes	Predict request with model_spec, input tensors map, and optional output_filter
thread_pool_options	`thread::ThreadPoolOptions`	No	Optional custom thread pools for inter/intra-op parallelism

Outputs

Name	Type	Description
response	`PredictResponse*`	Populated response with model_spec and output tensors map
return	`Status`	OK on success; FailedPrecondition if function not found; InvalidArgument for missing/mistyped inputs or invalid output_filter

Usage Examples

Basic Predict Call

tfrt::SavedModel::RunOptions run_options;
PredictRequest request;
request.mutable_model_spec()->set_name("my_model");
(*request.mutable_inputs())["input"].CopyFrom(input_tensor_proto);

PredictResponse response;
Status status = RunPredict(run_options, /*servable_version=*/1,
                           saved_model, request, &response);

Related Pages

Principle:Tensorflow_Serving_TFRT_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment