Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tensorflow Serving Tfrt Servable

From Leeroopedia
Knowledge Sources
Domains Model Serving, Servable Lifecycle
Last Updated 2026-02-13 00:00 GMT

Overview

Core TFRT servable implementation that provides PredictionService-like interface (Classify, Regress, Predict, MultiInference, PredictStreamed, GetModelMetadata) on top of a TFRT SavedModel.

Description

TfrtSavedModelServable is a thread-safe servable class that extends the abstract Servable base class and implements all major inference APIs using a TFRT SavedModel backend. It owns a tfrt_stub::SavedModel instance and delegates inference operations to the TFRT-specific classifier, regressor, predict, and multi-inference modules.

Key features include:

  • Classify/Regress/Predict/MultiInference: Each method creates a request recorder, converts Servable RunOptions to TFRT RunOptions (handling deadlines, validation flags, priority, and compilation disabling), then delegates to the corresponding TFRT inference function.
  • PredictStreamed: Returns a SingleRequestPredictStreamedContext that supports streaming prediction responses through a callback. Output tensors are sent incrementally via a streamed_output_callback on the TFRT run options.
  • GetModelMetadata: Validates the request and returns signature definitions from the MetaGraphDef, explicitly avoiding copying of SignatureDef defaults to reduce memory overhead.
  • Suspend/Resume: Supports model paging through configurable suspend and resume functions that can be set externally, protected by a mutex for thread safety.
  • RequestRecorder: An interface for custom metric and cost reporting that can be injected via a recorder_creator function.
  • CreateTfrtSavedModelServable: A convenience factory function that loads a TFRT SavedModel from a directory and wraps it in a TfrtSavedModelServable.

The predict response tensor serialization option (kAsProtoField or kAsProtoContent) is configured from the TfrtSavedModelConfig at construction time.

Usage

Use this class as the primary servable for models loaded through the TFRT pipeline. It is created by TfrtSavedModelFactory and managed by the serving infrastructure. All inference requests to TFRT models are routed through this servable's methods.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • Files:
    • tensorflow_serving/servables/tensorflow/tfrt_servable.h (lines 1-167)
    • tensorflow_serving/servables/tensorflow/tfrt_servable.cc (lines 1-311)

Signature

class TfrtSavedModelServable : public Servable {
 public:
  TfrtSavedModelServable(absl::string_view name, int64_t version,
                         const TfrtSavedModelConfig& config,
                         const SavedModelConfig& model_config,
                         std::unique_ptr<tfrt_stub::SavedModel> saved_model,
                         ThreadPoolFactory* thread_pool_factory);

  absl::Status Classify(const RunOptions& run_options,
                        const ClassificationRequest& request,
                        ClassificationResponse* response) override;
  absl::Status Regress(const RunOptions& run_options,
                       const RegressionRequest& request,
                       RegressionResponse* response) override;
  absl::Status Predict(const RunOptions& run_options,
                       const PredictRequest& request,
                       PredictResponse* response) override;
  absl::StatusOr<std::unique_ptr<PredictStreamedContext>> PredictStreamed(
      const RunOptions& run_options,
      absl::AnyInvocable<void(absl::StatusOr<PredictResponse>)>
          response_callback) override;
  absl::Status MultiInference(const RunOptions& run_options,
                              const MultiInferenceRequest& request,
                              MultiInferenceResponse* response) override;
  absl::Status GetModelMetadata(const GetModelMetadataRequest& request,
                                GetModelMetadataResponse* response) override;
  absl::Status Suspend() override;
  absl::Status Resume() override;
};

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_servable.h"

I/O Contract

Inputs

Name Type Required Description
name absl::string_view Yes Model name
version int64_t Yes Model version number
config TfrtSavedModelConfig Yes Adapter-level configuration shared across all servables
saved_model std::unique_ptr<tfrt_stub::SavedModel> Yes The loaded TFRT SavedModel instance
thread_pool_factory ThreadPoolFactory* No Optional factory for custom inter/intra-op thread pools

Outputs

Name Type Description
(inference responses) Various protobuf types Classification, regression, prediction, multi-inference, or metadata responses
return Status OK on success; appropriate error on failure

Usage Examples

Creating a TfrtSavedModelServable

auto options = tfrt_stub::SavedModel::Options(runtime);
auto servable = CreateTfrtSavedModelServable(
    options, "my_model", /*version=*/1,
    "/path/to/saved_model", {"serve"});

ClassificationRequest request;
ClassificationResponse response;
Servable::RunOptions run_options;
TF_RETURN_IF_ERROR(servable->Classify(run_options, request, &response));

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment