Implementation:Tensorflow Serving Remote Predict Op Kernel

Knowledge Sources	Tensorflow_Serving
Domains	TensorFlow Ops, Remote Inference
Last Updated	2026-02-13 00:00 GMT

Overview

A TensorFlow async op kernel that performs remote model inference by sending PredictRequest RPCs to a TensorFlow Serving instance and converting the response back into output tensors.

Description

RemotePredictOp<PredictionServiceStubType> is a templated AsyncOpKernel that enables one TensorFlow graph to call a remote TensorFlow Serving instance for inference. The constructor extracts op attributes (target_address, model_name, model_version, max_rpc_deadline_millis, fail_op_on_rpc_error, signature_name) and creates a prediction service stub via PredictionServiceStubType::Create(). In ComputeAsync(), it reads input tensor aliases and input tensors from the op's inputs, constructs a PredictRequest protobuf (populating model spec, input tensors serialized as TensorProto, and output filters), creates an RPC with the configured deadline, and sends it asynchronously via the prediction service stub. The PostProcessResponse() callback processes the PredictResponse by extracting status_code and status_error_message as output tensors, then deserializing each output tensor alias from the response's outputs map back into Tensor objects. If fail_op_on_rpc_error is false, RPC failures produce empty output tensors with the error status available as outputs. The flag remote_predict_op_use_tensor_content controls whether input tensors use AsProtoTensorContent (compact binary) or AsProtoField (field-by-field) serialization.

Usage

Use this op kernel within a TensorFlow graph that needs to call out to a remote TensorFlow Serving model server for inference, enabling model composition and distributed inference pipelines.

Code Reference

Source Location

Repository: Tensorflow_Serving
File: tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h
Lines: 1-213

Signature

template <typename PredictionServiceStubType>
class RemotePredictOp : public AsyncOpKernel {
 public:
  explicit RemotePredictOp(OpKernelConstruction* context);
  void ComputeAsync(OpKernelContext* context, DoneCallback done) override;
  void PostProcessResponse(OpKernelContext* context, PredictResponse* response,
                           const absl::Status& rpc_status,
                           bool fail_op_on_rpc_error,
                           TTypes<const tstring>::Flat output_tensor_aliases,
                           DoneCallback rpc_done);
};

Import

#include "tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h"

I/O Contract

Inputs

Name	Type	Required	Description
input_tensor_aliases	`Tensor (string)`	Yes	Names/aliases for the input tensors
input_tensors	`OpInputList`	Yes	The actual input tensors to send to the remote model
output_tensor_aliases	`Tensor (string)`	Yes	Names/aliases for the desired output tensors
target_address	`string (attr)`	Yes	Address of the remote TensorFlow Serving instance
model_name	`string (attr)`	Yes	Name of the model to invoke
model_version	`int64 (attr)`	No	Model version; -1 means use the latest
max_rpc_deadline_millis	`int64 (attr)`	No	RPC deadline in milliseconds
fail_op_on_rpc_error	`bool (attr)`	No	Whether to fail the op on RPC errors
signature_name	`string (attr)`	No	The signature def name; defaults to "serving_default"

Outputs

Name	Type	Description
status_code	`Tensor (int32)`	The RPC status code (0 for OK)
status_error_message	`Tensor (string)`	The RPC error message (empty on success)
output_tensors	`OpOutputList`	The output tensors from the remote prediction

Usage Examples

Using RemotePredictOp in a Graph (via Python)

// C++ kernel is typically invoked via the Python wrapper:
// remote_predict_ops.run(
//     input_tensor_alias=["input"],
//     input_tensors=[my_tensor],
//     output_tensor_alias=["output"],
//     target_address="localhost:8500",
//     model_name="my_model")

Related Pages

Principle:Tensorflow_Serving_Remote_Predict_Op

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment