Implementation:Tensorflow Serving Tfrt Servable
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Servable Lifecycle |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Core TFRT servable implementation that provides PredictionService-like interface (Classify, Regress, Predict, MultiInference, PredictStreamed, GetModelMetadata) on top of a TFRT SavedModel.
Description
TfrtSavedModelServable is a thread-safe servable class that extends the abstract Servable base class and implements all major inference APIs using a TFRT SavedModel backend. It owns a tfrt_stub::SavedModel instance and delegates inference operations to the TFRT-specific classifier, regressor, predict, and multi-inference modules.
Key features include:
- Classify/Regress/Predict/MultiInference: Each method creates a request recorder, converts Servable RunOptions to TFRT RunOptions (handling deadlines, validation flags, priority, and compilation disabling), then delegates to the corresponding TFRT inference function.
- PredictStreamed: Returns a SingleRequestPredictStreamedContext that supports streaming prediction responses through a callback. Output tensors are sent incrementally via a streamed_output_callback on the TFRT run options.
- GetModelMetadata: Validates the request and returns signature definitions from the MetaGraphDef, explicitly avoiding copying of SignatureDef defaults to reduce memory overhead.
- Suspend/Resume: Supports model paging through configurable suspend and resume functions that can be set externally, protected by a mutex for thread safety.
- RequestRecorder: An interface for custom metric and cost reporting that can be injected via a recorder_creator function.
- CreateTfrtSavedModelServable: A convenience factory function that loads a TFRT SavedModel from a directory and wraps it in a TfrtSavedModelServable.
The predict response tensor serialization option (kAsProtoField or kAsProtoContent) is configured from the TfrtSavedModelConfig at construction time.
Usage
Use this class as the primary servable for models loaded through the TFRT pipeline. It is created by TfrtSavedModelFactory and managed by the serving infrastructure. All inference requests to TFRT models are routed through this servable's methods.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tfrt_servable.h(lines 1-167)tensorflow_serving/servables/tensorflow/tfrt_servable.cc(lines 1-311)
Signature
class TfrtSavedModelServable : public Servable {
public:
TfrtSavedModelServable(absl::string_view name, int64_t version,
const TfrtSavedModelConfig& config,
const SavedModelConfig& model_config,
std::unique_ptr<tfrt_stub::SavedModel> saved_model,
ThreadPoolFactory* thread_pool_factory);
absl::Status Classify(const RunOptions& run_options,
const ClassificationRequest& request,
ClassificationResponse* response) override;
absl::Status Regress(const RunOptions& run_options,
const RegressionRequest& request,
RegressionResponse* response) override;
absl::Status Predict(const RunOptions& run_options,
const PredictRequest& request,
PredictResponse* response) override;
absl::StatusOr<std::unique_ptr<PredictStreamedContext>> PredictStreamed(
const RunOptions& run_options,
absl::AnyInvocable<void(absl::StatusOr<PredictResponse>)>
response_callback) override;
absl::Status MultiInference(const RunOptions& run_options,
const MultiInferenceRequest& request,
MultiInferenceResponse* response) override;
absl::Status GetModelMetadata(const GetModelMetadataRequest& request,
GetModelMetadataResponse* response) override;
absl::Status Suspend() override;
absl::Status Resume() override;
};
Import
#include "tensorflow_serving/servables/tensorflow/tfrt_servable.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | absl::string_view |
Yes | Model name |
| version | int64_t |
Yes | Model version number |
| config | TfrtSavedModelConfig |
Yes | Adapter-level configuration shared across all servables |
| saved_model | std::unique_ptr<tfrt_stub::SavedModel> |
Yes | The loaded TFRT SavedModel instance |
| thread_pool_factory | ThreadPoolFactory* |
No | Optional factory for custom inter/intra-op thread pools |
Outputs
| Name | Type | Description |
|---|---|---|
| (inference responses) | Various protobuf types | Classification, regression, prediction, multi-inference, or metadata responses |
| return | Status |
OK on success; appropriate error on failure |
Usage Examples
Creating a TfrtSavedModelServable
auto options = tfrt_stub::SavedModel::Options(runtime);
auto servable = CreateTfrtSavedModelServable(
options, "my_model", /*version=*/1,
"/path/to/saved_model", {"serve"});
ClassificationRequest request;
ClassificationResponse response;
Servable::RunOptions run_options;
TF_RETURN_IF_ERROR(servable->Classify(run_options, request, &response));