Implementation:Tensorflow Serving Remote Predict Op Kernel
| Knowledge Sources | |
|---|---|
| Domains | TensorFlow Ops, Remote Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A TensorFlow async op kernel that performs remote model inference by sending PredictRequest RPCs to a TensorFlow Serving instance and converting the response back into output tensors.
Description
RemotePredictOp<PredictionServiceStubType> is a templated AsyncOpKernel that enables one TensorFlow graph to call a remote TensorFlow Serving instance for inference. The constructor extracts op attributes (target_address, model_name, model_version, max_rpc_deadline_millis, fail_op_on_rpc_error, signature_name) and creates a prediction service stub via PredictionServiceStubType::Create(). In ComputeAsync(), it reads input tensor aliases and input tensors from the op's inputs, constructs a PredictRequest protobuf (populating model spec, input tensors serialized as TensorProto, and output filters), creates an RPC with the configured deadline, and sends it asynchronously via the prediction service stub. The PostProcessResponse() callback processes the PredictResponse by extracting status_code and status_error_message as output tensors, then deserializing each output tensor alias from the response's outputs map back into Tensor objects. If fail_op_on_rpc_error is false, RPC failures produce empty output tensors with the error status available as outputs. The flag remote_predict_op_use_tensor_content controls whether input tensors use AsProtoTensorContent (compact binary) or AsProtoField (field-by-field) serialization.
Usage
Use this op kernel within a TensorFlow graph that needs to call out to a remote TensorFlow Serving model server for inference, enabling model composition and distributed inference pipelines.
Code Reference
Source Location
- Repository: Tensorflow_Serving
- File:
tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h - Lines: 1-213
Signature
template <typename PredictionServiceStubType>
class RemotePredictOp : public AsyncOpKernel {
public:
explicit RemotePredictOp(OpKernelConstruction* context);
void ComputeAsync(OpKernelContext* context, DoneCallback done) override;
void PostProcessResponse(OpKernelContext* context, PredictResponse* response,
const absl::Status& rpc_status,
bool fail_op_on_rpc_error,
TTypes<const tstring>::Flat output_tensor_aliases,
DoneCallback rpc_done);
};
Import
#include "tensorflow_serving/experimental/tensorflow/ops/remote_predict/kernels/remote_predict_op_kernel.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_tensor_aliases | Tensor (string) |
Yes | Names/aliases for the input tensors |
| input_tensors | OpInputList |
Yes | The actual input tensors to send to the remote model |
| output_tensor_aliases | Tensor (string) |
Yes | Names/aliases for the desired output tensors |
| target_address | string (attr) |
Yes | Address of the remote TensorFlow Serving instance |
| model_name | string (attr) |
Yes | Name of the model to invoke |
| model_version | int64 (attr) |
No | Model version; -1 means use the latest |
| max_rpc_deadline_millis | int64 (attr) |
No | RPC deadline in milliseconds |
| fail_op_on_rpc_error | bool (attr) |
No | Whether to fail the op on RPC errors |
| signature_name | string (attr) |
No | The signature def name; defaults to "serving_default" |
Outputs
| Name | Type | Description |
|---|---|---|
| status_code | Tensor (int32) |
The RPC status code (0 for OK) |
| status_error_message | Tensor (string) |
The RPC error message (empty on success) |
| output_tensors | OpOutputList |
The output tensors from the remote prediction |
Usage Examples
Using RemotePredictOp in a Graph (via Python)
// C++ kernel is typically invoked via the Python wrapper:
// remote_predict_ops.run(
// input_tensor_alias=["input"],
// input_tensors=[my_tensor],
// output_tensor_alias=["output"],
// target_address="localhost:8500",
// model_name="my_model")