Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Serving Tfrt Multi Inference

From Leeroopedia
Knowledge Sources
Domains Model Serving, Multi Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Implements multi-inference execution for TFRT SavedModels, allowing multiple classification and regression tasks to be evaluated in a single request against shared input data.

Description

The TFRT Multi Inference module provides the RunMultiInference function that processes a MultiInferenceRequest containing multiple inference tasks against a single TFRT SavedModel. The function first serializes the shared input into tensors replicated per task, then validates that all tasks reference the same model name with unique signature names. Each task is pre-processed according to its method type (classification via PreProcessClassification or regression via PreProcessRegression). The actual inference is performed using RunMultipleSignatures on the TFRT SavedModel, which evaluates all function signatures in a single call for efficiency. Results are post-processed per task using the appropriate post-processor (PostProcessClassificationResult or PostProcessRegressionResult) and collected into the MultiInferenceResponse. Error logging to TFRT's error logging service is performed when enabled.

Usage

Use this module when a client needs to perform multiple classification and/or regression operations against the same model and input data in a single request. This reduces overhead compared to making separate requests. It is called by TfrtSavedModelServable's MultiInference method and during TFRT model warmup.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • Files:
    • tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h (lines 1-38)
    • tensorflow_serving/servables/tensorflow/tfrt_multi_inference.cc (lines 1-136)

Signature

// Implementation of MultiInference using the tfrt::SavedModel.
Status RunMultiInference(const tfrt::SavedModel::RunOptions& run_options,
                         const absl::optional<int64_t>& servable_version,
                         tfrt::SavedModel* saved_model,
                         const MultiInferenceRequest& request,
                         MultiInferenceResponse* response);

Import

#include "tensorflow_serving/servables/tensorflow/tfrt_multi_inference.h"

I/O Contract

Inputs

Name Type Required Description
run_options tfrt::SavedModel::RunOptions Yes Runtime options for TFRT execution
servable_version absl::optional<int64_t> No Version to set on response ModelSpecs
saved_model tfrt::SavedModel* Yes Loaded TFRT SavedModel
request MultiInferenceRequest Yes Request containing shared input and multiple inference tasks (each with model_spec, method_name)

Outputs

Name Type Description
response MultiInferenceResponse* Contains per-task InferenceResult with classification or regression results and model specs
return Status OK on success; InvalidArgument for duplicate signatures or mismatched model names; Unimplemented for unsupported method names

Usage Examples

Multi-Inference Request

tfrt::SavedModel::RunOptions run_options;
MultiInferenceRequest request;
// Add classification task
auto* task1 = request.add_tasks();
task1->mutable_model_spec()->set_name("my_model");
task1->set_method_name("tensorflow/serving/classify");
// Add regression task
auto* task2 = request.add_tasks();
task2->mutable_model_spec()->set_name("my_model");
task2->mutable_model_spec()->set_signature_name("regress_x_to_y");
task2->set_method_name("tensorflow/serving/regress");

MultiInferenceResponse response;
Status status = RunMultiInference(run_options, version,
                                  saved_model, request, &response);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment