Implementation:Tensorflow Serving Multi Inference
| Knowledge Sources | |
|---|---|
| Domains | Model Serving, Multi Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Implements multi-inference execution for standard TensorFlow SavedModel sessions, allowing multiple classification and regression tasks to share a single Session::Run call.
Description
The Multi Inference module provides the TensorFlowMultiInferenceRunner class and the convenience RunMultiInference function for executing multiple inference tasks against a standard TensorFlow session (not TFRT). This is the classic TensorFlow implementation of the MultiInference API.
TensorFlowMultiInferenceRunner takes a Session pointer, a MetaGraphDef, optional servable version, and thread pool options at construction. Its Infer method processes a MultiInferenceRequest by:
1. Pre-processing: Validates that all tasks reference the same model name with unique signature names. For each task, looks up the SignatureDef from the MetaGraphDef and delegates to PreProcessClassification or PreProcessRegression to extract input and output tensor names.
2. Execution: Collects all unique input and output tensor names across tasks, then performs a single PerformOneShotTensorComputation call that serializes the shared input, runs the session once with all needed tensors, and records example counts.
3. Post-processing: Routes each task's outputs through PostProcessClassificationResult or PostProcessRegressionResult and populates the response with model specs.
The key optimization is that all tasks share a single Session::Run call, avoiding redundant computation for shared subgraphs.
Usage
Use this module for multi-inference requests against standard TensorFlow sessions (SavedModelBundle). For TFRT-based models, use the tfrt_multi_inference module instead. This is used by the standard serving pipeline when multiple classification/regression operations are needed on the same input.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- Files:
tensorflow_serving/servables/tensorflow/multi_inference.h(lines 1-76)tensorflow_serving/servables/tensorflow/multi_inference.cc(lines 1-142)
Signature
class TensorFlowMultiInferenceRunner {
public:
TensorFlowMultiInferenceRunner(
Session* session, const MetaGraphDef* meta_graph_def,
absl::optional<int64_t> servable_version,
const thread::ThreadPoolOptions& thread_pool_options =
thread::ThreadPoolOptions());
Status Infer(const RunOptions& run_options,
const MultiInferenceRequest& request,
MultiInferenceResponse* response);
};
Status RunMultiInference(
const RunOptions& run_options, const MetaGraphDef& meta_graph_def,
const absl::optional<int64_t>& servable_version, Session* session,
const MultiInferenceRequest& request, MultiInferenceResponse* response,
const tensorflow::thread::ThreadPoolOptions& thread_pool_options =
tensorflow::thread::ThreadPoolOptions());
Import
#include "tensorflow_serving/servables/tensorflow/multi_inference.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| session | Session* |
Yes | Active TensorFlow session for model execution |
| meta_graph_def | const MetaGraphDef* |
Yes | MetaGraphDef containing signature definitions |
| servable_version | absl::optional<int64_t> |
No | Version to set on response ModelSpecs |
| request | MultiInferenceRequest |
Yes | Request with shared input and multiple inference tasks |
Outputs
| Name | Type | Description |
|---|---|---|
| response | MultiInferenceResponse* |
Per-task inference results with classification or regression outcomes |
| return | Status |
OK on success; InvalidArgument for invalid task configurations |
Usage Examples
Running Multi-Inference
RunOptions run_options;
MultiInferenceRequest request;
MultiInferenceResponse response;
Status status = RunMultiInference(
run_options, bundle.meta_graph_def,
/*servable_version=*/1, bundle.session.get(),
request, &response);