Implementation:Kserve Kserve Prediction Request Protocol
| Knowledge Sources | |
|---|---|
| Domains | API_Design, Model_Serving, Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete REST and gRPC client patterns for sending prediction requests to KServe InferenceService endpoints using V1 and V2 protocols.
Description
KServe provides sample clients demonstrating both V1 REST and gRPC prediction protocols. The gRPC client uses TensorFlow Serving's PredictionServiceStub to send protobuf-encoded requests. The REST client uses standard HTTP POST with JSON payloads. URL paths are generated by the PredictPath() function in KServe's constants package.
Usage
Use REST for simple JSON-based predictions. Use gRPC for high-throughput, low-latency scenarios or when working with binary tensor data. The gRPC client requires the tensorflow-serving-api package.
Code Reference
Source Location
- Repository: kserve
- File: docs/samples/v1beta1/tensorflow/grpc_client.py, Lines 1-72 (gRPC client)
- File: docs/samples/v1beta1/tensorflow/input.json, Lines 1-10 (REST input format)
- File: pkg/constants/constants.go, Lines 687-695 (PredictPath function)
Signature
gRPC Client
def predict(host: str, port: int, hostname: str, model: str,
signature_name: str, input_path: str) -> None:
"""Send a gRPC prediction request to a KServe InferenceService.
Args:
host: Ingress gateway IP address
port: Ingress gateway port (default 80)
hostname: Knative DNS override (used for ssl_target_name_override)
model: Model name in the InferenceService
signature_name: TensorFlow SavedModel signature (default "serving_default")
input_path: Path to JSON input file
"""
PredictPath (Go)
// PredictPath generates the prediction URL path for a model
func PredictPath(name string, protocol InferenceServiceProtocol) string
// V1: /v1/models/<name>:predict
// V2: /v2/models/<name>/infer
Import
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| host | string | Yes | Ingress gateway IP address |
| port | int | Yes | Ingress gateway port (default 80) |
| hostname | string | Yes (gRPC) | Knative host header for routing |
| model | string | Yes | Model name |
| input_data | JSON/protobuf | Yes | Prediction input data |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions (V1) | JSON | {"predictions": [...]} — model output values |
| outputs (V2) | JSON | {"outputs": [...]} — tensor-typed output values |
| PredictResponse (gRPC) | protobuf | TensorFlow Serving PredictResponse |
Usage Examples
REST V1 Prediction
# Determine ingress endpoint
INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway \
-o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice flower-sample \
-o jsonpath='{.status.url}' | cut -d "/" -f 3)
# Send prediction
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/flower-sample:predict \
-d '{"instances": [{"image_bytes": {"b64": "iVBOR..."}, "key": "1"}]}'
gRPC Prediction
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc
import json
# Connect to ingress
channel = grpc.insecure_channel(f"{host}:{port}")
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
# Build request
request = predict_pb2.PredictRequest()
request.model_spec.name = "flower-sample"
request.model_spec.signature_name = "serving_default"
# Send with host header override
metadata = [("host", hostname)]
response = stub.Predict(request, 30.0, metadata=metadata)
print(response)