Implementation:Tensorflow Serving Tfrt Multi Inference

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Multi Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Implements multi-inference execution for TFRT SavedModels, allowing multiple classification and regression tasks to be evaluated in a single request against shared input data.

Description

The TFRT Multi Inference module provides the RunMultiInference function that processes a MultiInferenceRequest containing multiple inference tasks against a single TFRT SavedModel. The function first serializes the shared input into tensors replicated per task, then validates that all tasks reference the same model name with unique signature names. Each task is pre-processed according to its method type (classification via PreProcessClassification or regression via PreProcessRegression). The actual inference is performed using RunMultipleSignatures on the TFRT SavedModel, which evaluates all function signatures in a single call for efficiency. Results are post-processed per task using the appropriate post-processor (PostProcessClassificationResult or PostProcessRegressionResult) and collected into the MultiInferenceResponse. Error logging to TFRT's error logging service is performed when enabled.

Usage

Use this module when a client needs to perform multiple classification and/or regression operations against the same model and input data in a single request. This reduces overhead compared to making separate requests. It is called by TfrtSavedModelServable's MultiInference method and during TFRT model warmup.

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h (lines 1-38)
- tensorflow_serving/servables/tensorflow/tfrt_multi_inference.cc (lines 1-136)

Signature

// Implementation of MultiInference using the tfrt::SavedModel.
Status RunMultiInference(const tfrt::SavedModel::RunOptions& run_options,
                         const absl::optional<int64_t>& servable_version,
                         tfrt::SavedModel* saved_model,
                         const MultiInferenceRequest& request,
                         MultiInferenceResponse* response);

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h"

I/O Contract

Inputs

Name	Type	Required	Description
run_options	`tfrt::SavedModel::RunOptions`	Yes	Runtime options for TFRT execution
servable_version	`absl::optional<int64_t>`	No	Version to set on response ModelSpecs
saved_model	`tfrt::SavedModel*`	Yes	Loaded TFRT SavedModel
request	`MultiInferenceRequest`	Yes	Request containing shared input and multiple inference tasks (each with model_spec, method_name)

Outputs

Name	Type	Description
response	`MultiInferenceResponse*`	Contains per-task InferenceResult with classification or regression results and model specs
return	`Status`	OK on success; InvalidArgument for duplicate signatures or mismatched model names; Unimplemented for unsupported method names

Usage Examples

Multi-Inference Request

tfrt::SavedModel::RunOptions run_options;
MultiInferenceRequest request;
// Add classification task
auto* task1 = request.add_tasks();
task1->mutable_model_spec()->set_name("my_model");
task1->set_method_name("tensorflow/serving/classify");
// Add regression task
auto* task2 = request.add_tasks();
task2->mutable_model_spec()->set_name("my_model");
task2->mutable_model_spec()->set_signature_name("regress_x_to_y");
task2->set_method_name("tensorflow/serving/regress");

MultiInferenceResponse response;
Status status = RunMultiInference(run_options, version,
                                  saved_model, request, &response);

Related Pages

Principle:Tensorflow_Serving_TFRT_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment