Principle:Tensorflow Serving REST Inference Execution
| Knowledge Sources | |
|---|---|
| Domains | Inference, Networking |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
An inference execution pipeline that processes REST predict requests by parsing JSON, running the TensorFlow session, and serializing results back to JSON.
Description
REST inference execution is the complete request processing path from HTTP request arrival to response generation. The pipeline:
- ProcessPredictRequest() receives the parsed URL info and raw JSON body
- FillPredictRequestFromJson() converts JSON to PredictRequest proto
- TensorflowPredictor::Predict() resolves the model via ServerCore and runs inference
- RunPredict() calls session->Run() with the input tensors
- MakeJsonFromTensors() converts output TensorProtos back to JSON
- Response is returned to the HTTP handler
This shared execution path is also used by Classify and Regress endpoints with different serialization formats.
Usage
Use the REST predict endpoint for any model served by TensorFlow Serving when HTTP access is needed. The URL pattern is POST /v1/models/{name}:predict (or :classify, :regress).
Theoretical Basis
# Abstract REST inference pipeline (NOT real implementation)
def process_predict(model_name, version, json_body):
request_proto = json_to_predict_request(json_body)
servable = server_core.get_servable(model_name, version)
session = servable.session
inputs = extract_feed_dict(request_proto, servable.metagraph)
outputs = session.run(output_names, feed_dict=inputs)
response_json = tensors_to_json(outputs, format=request_format)
return response_json