Implementation:Tensorflow Serving Tfrt Multi Inference
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Multi Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Implements multi-inference execution for TFRT SavedModels, allowing multiple classification and regression tasks to be evaluated in a single request against shared input data.
Description
The TFRT Multi Inference module provides the RunMultiInference function that processes a MultiInferenceRequest containing multiple inference tasks against a single TFRT SavedModel. The function first serializes the shared input into tensors replicated per task, then validates that all tasks reference the same model name with unique signature names. Each task is pre-processed according to its method type (classification via PreProcessClassification or regression via PreProcessRegression). The actual inference is performed using RunMultipleSignatures on the TFRT SavedModel, which evaluates all function signatures in a single call for efficiency. Results are post-processed per task using the appropriate post-processor (PostProcessClassificationResult or PostProcessRegressionResult) and collected into the MultiInferenceResponse. Error logging to TFRT's error logging service is performed when enabled.
Usage
Use this module when a client needs to perform multiple classification and/or regression operations against the same model and input data in a single request. This reduces overhead compared to making separate requests. It is called by TfrtSavedModelServable's MultiInference method and during TFRT model warmup.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h(lines 1-38)tensorflow_serving/servables/tensorflow/tfrt_multi_inference.cc(lines 1-136)
Signature
// Implementation of MultiInference using the tfrt::SavedModel.
Status RunMultiInference(const tfrt::SavedModel::RunOptions& run_options,
const absl::optional<int64_t>& servable_version,
tfrt::SavedModel* saved_model,
const MultiInferenceRequest& request,
MultiInferenceResponse* response);
Import
#include "tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| run_options | tfrt::SavedModel::RunOptions |
Yes | Runtime options for TFRT execution |
| servable_version | absl::optional<int64_t> |
No | Version to set on response ModelSpecs |
| saved_model | tfrt::SavedModel* |
Yes | Loaded TFRT SavedModel |
| request | MultiInferenceRequest |
Yes | Request containing shared input and multiple inference tasks (each with model_spec, method_name) |
Outputs
| Name | Type | Description |
|---|---|---|
| response | MultiInferenceResponse* |
Contains per-task InferenceResult with classification or regression results and model specs |
| return | Status |
OK on success; InvalidArgument for duplicate signatures or mismatched model names; Unimplemented for unsupported method names |
Usage Examples
Multi-Inference Request
tfrt::SavedModel::RunOptions run_options;
MultiInferenceRequest request;
// Add classification task
auto* task1 = request.add_tasks();
task1->mutable_model_spec()->set_name("my_model");
task1->set_method_name("tensorflow/serving/classify");
// Add regression task
auto* task2 = request.add_tasks();
task2->mutable_model_spec()->set_name("my_model");
task2->mutable_model_spec()->set_signature_name("regress_x_to_y");
task2->set_method_name("tensorflow/serving/regress");
MultiInferenceResponse response;
Status status = RunMultiInference(run_options, version,
saved_model, request, &response);