Implementation:Tensorflow Serving Multi Inference

Knowledge Sources	Tensorflow_Serving
Domains	Model Serving, Multi Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Implements multi-inference execution for standard TensorFlow SavedModel sessions, allowing multiple classification and regression tasks to share a single Session::Run call.

Description

The Multi Inference module provides the TensorFlowMultiInferenceRunner class and the convenience RunMultiInference function for executing multiple inference tasks against a standard TensorFlow session (not TFRT). This is the classic TensorFlow implementation of the MultiInference API.

TensorFlowMultiInferenceRunner takes a Session pointer, a MetaGraphDef, optional servable version, and thread pool options at construction. Its Infer method processes a MultiInferenceRequest by:

1. Pre-processing: Validates that all tasks reference the same model name with unique signature names. For each task, looks up the SignatureDef from the MetaGraphDef and delegates to PreProcessClassification or PreProcessRegression to extract input and output tensor names.

2. Execution: Collects all unique input and output tensor names across tasks, then performs a single PerformOneShotTensorComputation call that serializes the shared input, runs the session once with all needed tensors, and records example counts.

3. Post-processing: Routes each task's outputs through PostProcessClassificationResult or PostProcessRegressionResult and populates the response with model specs.

The key optimization is that all tasks share a single Session::Run call, avoiding redundant computation for shared subgraphs.

Usage

Use this module for multi-inference requests against standard TensorFlow sessions (SavedModelBundle). For TFRT-based models, use the tfrt_multi_inference module instead. This is used by the standard serving pipeline when multiple classification/regression operations are needed on the same input.

Code Reference

Source Location

Repository: Tensorflow_Serving
Files:
- tensorflow_serving/servables/tensorflow/multi_inference.h (lines 1-76)
- tensorflow_serving/servables/tensorflow/multi_inference.cc (lines 1-142)

Signature

class TensorFlowMultiInferenceRunner {
 public:
  TensorFlowMultiInferenceRunner(
      Session* session, const MetaGraphDef* meta_graph_def,
      absl::optional<int64_t> servable_version,
      const thread::ThreadPoolOptions& thread_pool_options =
          thread::ThreadPoolOptions());

  Status Infer(const RunOptions& run_options,
               const MultiInferenceRequest& request,
               MultiInferenceResponse* response);
};

Status RunMultiInference(
    const RunOptions& run_options, const MetaGraphDef& meta_graph_def,
    const absl::optional<int64_t>& servable_version, Session* session,
    const MultiInferenceRequest& request, MultiInferenceResponse* response,
    const tensorflow::thread::ThreadPoolOptions& thread_pool_options =
        tensorflow::thread::ThreadPoolOptions());

Import

#include "tensorflow_serving/servables/tensorflow/multi_inference.h"

I/O Contract

Inputs

Name	Type	Required	Description
session	`Session*`	Yes	Active TensorFlow session for model execution
meta_graph_def	`const MetaGraphDef*`	Yes	MetaGraphDef containing signature definitions
servable_version	`absl::optional<int64_t>`	No	Version to set on response ModelSpecs
request	`MultiInferenceRequest`	Yes	Request with shared input and multiple inference tasks

Outputs

Name	Type	Description
response	`MultiInferenceResponse*`	Per-task inference results with classification or regression outcomes
return	`Status`	OK on success; InvalidArgument for invalid task configurations

Usage Examples

Running Multi-Inference

RunOptions run_options;
MultiInferenceRequest request;
MultiInferenceResponse response;
Status status = RunMultiInference(
    run_options, bundle.meta_graph_def,
    /*servable_version=*/1, bundle.session.get(),
    request, &response);

Related Pages

Principle:Tensorflow_Serving_Multi_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment