Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tensorflow Serving Multi Inference

From Leeroopedia
Knowledge Sources
Domains Model Serving, Multi Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Implements multi-inference execution for standard TensorFlow SavedModel sessions, allowing multiple classification and regression tasks to share a single Session::Run call.

Description

The Multi Inference module provides the TensorFlowMultiInferenceRunner class and the convenience RunMultiInference function for executing multiple inference tasks against a standard TensorFlow session (not TFRT). This is the classic TensorFlow implementation of the MultiInference API.

TensorFlowMultiInferenceRunner takes a Session pointer, a MetaGraphDef, optional servable version, and thread pool options at construction. Its Infer method processes a MultiInferenceRequest by:

1. Pre-processing: Validates that all tasks reference the same model name with unique signature names. For each task, looks up the SignatureDef from the MetaGraphDef and delegates to PreProcessClassification or PreProcessRegression to extract input and output tensor names.

2. Execution: Collects all unique input and output tensor names across tasks, then performs a single PerformOneShotTensorComputation call that serializes the shared input, runs the session once with all needed tensors, and records example counts.

3. Post-processing: Routes each task's outputs through PostProcessClassificationResult or PostProcessRegressionResult and populates the response with model specs.

The key optimization is that all tasks share a single Session::Run call, avoiding redundant computation for shared subgraphs.

Usage

Use this module for multi-inference requests against standard TensorFlow sessions (SavedModelBundle). For TFRT-based models, use the tfrt_multi_inference module instead. This is used by the standard serving pipeline when multiple classification/regression operations are needed on the same input.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • Files:
    • tensorflow_serving/servables/tensorflow/multi_inference.h (lines 1-76)
    • tensorflow_serving/servables/tensorflow/multi_inference.cc (lines 1-142)

Signature

class TensorFlowMultiInferenceRunner {
 public:
  TensorFlowMultiInferenceRunner(
      Session* session, const MetaGraphDef* meta_graph_def,
      absl::optional<int64_t> servable_version,
      const thread::ThreadPoolOptions& thread_pool_options =
          thread::ThreadPoolOptions());

  Status Infer(const RunOptions& run_options,
               const MultiInferenceRequest& request,
               MultiInferenceResponse* response);
};

Status RunMultiInference(
    const RunOptions& run_options, const MetaGraphDef& meta_graph_def,
    const absl::optional<int64_t>& servable_version, Session* session,
    const MultiInferenceRequest& request, MultiInferenceResponse* response,
    const tensorflow::thread::ThreadPoolOptions& thread_pool_options =
        tensorflow::thread::ThreadPoolOptions());

Import

#include "tensorflow_serving/servables/tensorflow/multi_inference.h"

I/O Contract

Inputs

Name Type Required Description
session Session* Yes Active TensorFlow session for model execution
meta_graph_def const MetaGraphDef* Yes MetaGraphDef containing signature definitions
servable_version absl::optional<int64_t> No Version to set on response ModelSpecs
request MultiInferenceRequest Yes Request with shared input and multiple inference tasks

Outputs

Name Type Description
response MultiInferenceResponse* Per-task inference results with classification or regression outcomes
return Status OK on success; InvalidArgument for invalid task configurations

Usage Examples

Running Multi-Inference

RunOptions run_options;
MultiInferenceRequest request;
MultiInferenceResponse response;
Status status = RunMultiInference(
    run_options, bundle.meta_graph_def,
    /*servable_version=*/1, bundle.session.get(),
    request, &response);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment