Implementation:Tensorflow Serving Tfrt Servable

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Servable Lifecycle
Last Updated	2026-02-13 00:00 GMT

Overview

Core TFRT servable implementation that provides PredictionService-like interface (Classify, Regress, Predict, MultiInference, PredictStreamed, GetModelMetadata) on top of a TFRT SavedModel.

Description

TfrtSavedModelServable is a thread-safe servable class that extends the abstract Servable base class and implements all major inference APIs using a TFRT SavedModel backend. It owns a tfrt_stub::SavedModel instance and delegates inference operations to the TFRT-specific classifier, regressor, predict, and multi-inference modules.

Key features include:

Classify/Regress/Predict/MultiInference: Each method creates a request recorder, converts Servable RunOptions to TFRT RunOptions (handling deadlines, validation flags, priority, and compilation disabling), then delegates to the corresponding TFRT inference function.
PredictStreamed: Returns a SingleRequestPredictStreamedContext that supports streaming prediction responses through a callback. Output tensors are sent incrementally via a streamed_output_callback on the TFRT run options.
GetModelMetadata: Validates the request and returns signature definitions from the MetaGraphDef, explicitly avoiding copying of SignatureDef defaults to reduce memory overhead.
Suspend/Resume: Supports model paging through configurable suspend and resume functions that can be set externally, protected by a mutex for thread safety.
RequestRecorder: An interface for custom metric and cost reporting that can be injected via a recorder_creator function.
CreateTfrtSavedModelServable: A convenience factory function that loads a TFRT SavedModel from a directory and wraps it in a TfrtSavedModelServable.

The predict response tensor serialization option (kAsProtoField or kAsProtoContent) is configured from the TfrtSavedModelConfig at construction time.

Usage

Use this class as the primary servable for models loaded through the TFRT pipeline. It is created by TfrtSavedModelFactory and managed by the serving infrastructure. All inference requests to TFRT models are routed through this servable's methods.

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/tfrt_servable.h (lines 1-167)
- tensorflow_serving/servables/tensorflow/tfrt_servable.cc (lines 1-311)

Signature

class TfrtSavedModelServable : public Servable {
 public:
  TfrtSavedModelServable(absl::string_view name, int64_t version,
                         const TfrtSavedModelConfig& config,
                         const SavedModelConfig& model_config,
                         std::unique_ptr<tfrt_stub::SavedModel> saved_model,
                         ThreadPoolFactory* thread_pool_factory);

  absl::Status Classify(const RunOptions& run_options,
                        const ClassificationRequest& request,
                        ClassificationResponse* response) override;
  absl::Status Regress(const RunOptions& run_options,
                       const RegressionRequest& request,
                       RegressionResponse* response) override;
  absl::Status Predict(const RunOptions& run_options,
                       const PredictRequest& request,
                       PredictResponse* response) override;
  absl::StatusOr<std::unique_ptr<PredictStreamedContext>> PredictStreamed(
      const RunOptions& run_options,
      absl::AnyInvocable<void(absl::StatusOr<PredictResponse>)>
          response_callback) override;
  absl::Status MultiInference(const RunOptions& run_options,
                              const MultiInferenceRequest& request,
                              MultiInferenceResponse* response) override;
  absl::Status GetModelMetadata(const GetModelMetadataRequest& request,
                                GetModelMetadataResponse* response) override;
  absl::Status Suspend() override;
  absl::Status Resume() override;
};

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_servable.h"

I/O Contract

Inputs

Name	Type	Required	Description
name	`absl::string_view`	Yes	Model name
version	`int64_t`	Yes	Model version number
config	`TfrtSavedModelConfig`	Yes	Adapter-level configuration shared across all servables
saved_model	`std::unique_ptr<tfrt_stub::SavedModel>`	Yes	The loaded TFRT SavedModel instance
thread_pool_factory	`ThreadPoolFactory*`	No	Optional factory for custom inter/intra-op thread pools

Outputs

Name	Type	Description
(inference responses)	Various protobuf types	Classification, regression, prediction, multi-inference, or metadata responses
return	`Status`	OK on success; appropriate error on failure

Usage Examples

Creating a TfrtSavedModelServable

auto options = tfrt_stub::SavedModel::Options(runtime);
auto servable = CreateTfrtSavedModelServable(
    options, "my_model", /*version=*/1,
    "/path/to/saved_model", {"serve"});

ClassificationRequest request;
ClassificationResponse response;
Servable::RunOptions run_options;
TF_RETURN_IF_ERROR(servable->Classify(run_options, request, &response));

Related Pages

Principle:Tensorflow_Serving_TFRT_Model_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment