Implementation:Kserve Kserve Prediction Request Protocol

Knowledge Sources	KServe KServe V1 Protocol
Domains	API_Design, Model_Serving, Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete REST and gRPC client patterns for sending prediction requests to KServe InferenceService endpoints using V1 and V2 protocols.

Description

KServe provides sample clients demonstrating both V1 REST and gRPC prediction protocols. The gRPC client uses TensorFlow Serving's PredictionServiceStub to send protobuf-encoded requests. The REST client uses standard HTTP POST with JSON payloads. URL paths are generated by the PredictPath() function in KServe's constants package.

Usage

Use REST for simple JSON-based predictions. Use gRPC for high-throughput, low-latency scenarios or when working with binary tensor data. The gRPC client requires the tensorflow-serving-api package.

Code Reference

Source Location

Repository: kserve
File: docs/samples/v1beta1/tensorflow/grpc_client.py, Lines 1-72 (gRPC client)
File: docs/samples/v1beta1/tensorflow/input.json, Lines 1-10 (REST input format)
File: pkg/constants/constants.go, Lines 687-695 (PredictPath function)

Signature

gRPC Client

def predict(host: str, port: int, hostname: str, model: str,
            signature_name: str, input_path: str) -> None:
    """Send a gRPC prediction request to a KServe InferenceService.

    Args:
        host: Ingress gateway IP address
        port: Ingress gateway port (default 80)
        hostname: Knative DNS override (used for ssl_target_name_override)
        model: Model name in the InferenceService
        signature_name: TensorFlow SavedModel signature (default "serving_default")
        input_path: Path to JSON input file
    """

PredictPath (Go)

// PredictPath generates the prediction URL path for a model
func PredictPath(name string, protocol InferenceServiceProtocol) string
// V1: /v1/models/<name>:predict
// V2: /v2/models/<name>/infer

Import

from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc

I/O Contract

Inputs

Name	Type	Required	Description
host	string	Yes	Ingress gateway IP address
port	int	Yes	Ingress gateway port (default 80)
hostname	string	Yes (gRPC)	Knative host header for routing
model	string	Yes	Model name
input_data	JSON/protobuf	Yes	Prediction input data

Outputs

Name	Type	Description
predictions (V1)	JSON	{"predictions": [...]} — model output values
outputs (V2)	JSON	{"outputs": [...]} — tensor-typed output values
PredictResponse (gRPC)	protobuf	TensorFlow Serving PredictResponse

Usage Examples

REST V1 Prediction

# Determine ingress endpoint
INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice flower-sample \
  -o jsonpath='{.status.url}' | cut -d "/" -f 3)

# Send prediction
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
  http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/flower-sample:predict \
  -d '{"instances": [{"image_bytes": {"b64": "iVBOR..."}, "key": "1"}]}'

gRPC Prediction

from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc
import json

# Connect to ingress
channel = grpc.insecure_channel(f"{host}:{port}")
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# Build request
request = predict_pb2.PredictRequest()
request.model_spec.name = "flower-sample"
request.model_spec.signature_name = "serving_default"

# Send with host header override
metadata = [("host", hostname)]
response = stub.Predict(request, 30.0, metadata=metadata)
print(response)

Related Pages

Implements Principle

Principle:Kserve_Kserve_Prediction_Protocol

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment