Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tensorflow Serving Remote Predict Op Kernel

From Leeroopedia
Knowledge Sources
Domains TensorFlow Ops, Remote Inference
Last Updated 2026-02-13 00:00 GMT

Overview

A TensorFlow async op kernel that performs remote model inference by sending PredictRequest RPCs to a TensorFlow Serving instance and converting the response back into output tensors.

Description

RemotePredictOp<PredictionServiceStubType> is a templated AsyncOpKernel that enables one TensorFlow graph to call a remote TensorFlow Serving instance for inference. The constructor extracts op attributes (target_address, model_name, model_version, max_rpc_deadline_millis, fail_op_on_rpc_error, signature_name) and creates a prediction service stub via PredictionServiceStubType::Create(). In ComputeAsync(), it reads input tensor aliases and input tensors from the op's inputs, constructs a PredictRequest protobuf (populating model spec, input tensors serialized as TensorProto, and output filters), creates an RPC with the configured deadline, and sends it asynchronously via the prediction service stub. The PostProcessResponse() callback processes the PredictResponse by extracting status_code and status_error_message as output tensors, then deserializing each output tensor alias from the response's outputs map back into Tensor objects. If fail_op_on_rpc_error is false, RPC failures produce empty output tensors with the error status available as outputs. The flag remote_predict_op_use_tensor_content controls whether input tensors use AsProtoTensorContent (compact binary) or AsProtoField (field-by-field) serialization.

Usage

Use this op kernel within a TensorFlow graph that needs to call out to a remote TensorFlow Serving model server for inference, enabling model composition and distributed inference pipelines.

Code Reference

Source Location

  • Repository: Tensorflow_Serving
  • File: tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h
  • Lines: 1-213

Signature

template <typename PredictionServiceStubType>
class RemotePredictOp : public AsyncOpKernel {
 public:
  explicit RemotePredictOp(OpKernelConstruction* context);
  void ComputeAsync(OpKernelContext* context, DoneCallback done) override;
  void PostProcessResponse(OpKernelContext* context, PredictResponse* response,
                           const absl::Status& rpc_status,
                           bool fail_op_on_rpc_error,
                           TTypes<const tstring>::Flat output_tensor_aliases,
                           DoneCallback rpc_done);
};

Import

#include "tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h"

I/O Contract

Inputs

Name Type Required Description
input_tensor_aliases Tensor (string) Yes Names/aliases for the input tensors
input_tensors OpInputList Yes The actual input tensors to send to the remote model
output_tensor_aliases Tensor (string) Yes Names/aliases for the desired output tensors
target_address string (attr) Yes Address of the remote TensorFlow Serving instance
model_name string (attr) Yes Name of the model to invoke
model_version int64 (attr) No Model version; -1 means use the latest
max_rpc_deadline_millis int64 (attr) No RPC deadline in milliseconds
fail_op_on_rpc_error bool (attr) No Whether to fail the op on RPC errors
signature_name string (attr) No The signature def name; defaults to "serving_default"

Outputs

Name Type Description
status_code Tensor (int32) The RPC status code (0 for OK)
status_error_message Tensor (string) The RPC error message (empty on success)
output_tensors OpOutputList The output tensors from the remote prediction

Usage Examples

Using RemotePredictOp in a Graph (via Python)

// C++ kernel is typically invoked via the Python wrapper:
// remote_predict_ops.run(
//     input_tensor_alias=["input"],
//     input_tensors=[my_tensor],
//     output_tensor_alias=["output"],
//     target_address="localhost:8500",
//     model_name="my_model")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment