Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve Prediction Request Protocol

From Leeroopedia
Knowledge Sources
Domains API_Design, Model_Serving, Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete REST and gRPC client patterns for sending prediction requests to KServe InferenceService endpoints using V1 and V2 protocols.

Description

KServe provides sample clients demonstrating both V1 REST and gRPC prediction protocols. The gRPC client uses TensorFlow Serving's PredictionServiceStub to send protobuf-encoded requests. The REST client uses standard HTTP POST with JSON payloads. URL paths are generated by the PredictPath() function in KServe's constants package.

Usage

Use REST for simple JSON-based predictions. Use gRPC for high-throughput, low-latency scenarios or when working with binary tensor data. The gRPC client requires the tensorflow-serving-api package.

Code Reference

Source Location

  • Repository: kserve
  • File: docs/samples/v1beta1/tensorflow/grpc_client.py, Lines 1-72 (gRPC client)
  • File: docs/samples/v1beta1/tensorflow/input.json, Lines 1-10 (REST input format)
  • File: pkg/constants/constants.go, Lines 687-695 (PredictPath function)

Signature

gRPC Client

def predict(host: str, port: int, hostname: str, model: str,
            signature_name: str, input_path: str) -> None:
    """Send a gRPC prediction request to a KServe InferenceService.

    Args:
        host: Ingress gateway IP address
        port: Ingress gateway port (default 80)
        hostname: Knative DNS override (used for ssl_target_name_override)
        model: Model name in the InferenceService
        signature_name: TensorFlow SavedModel signature (default "serving_default")
        input_path: Path to JSON input file
    """

PredictPath (Go)

// PredictPath generates the prediction URL path for a model
func PredictPath(name string, protocol InferenceServiceProtocol) string
// V1: /v1/models/<name>:predict
// V2: /v2/models/<name>/infer

Import

from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc

I/O Contract

Inputs

Name Type Required Description
host string Yes Ingress gateway IP address
port int Yes Ingress gateway port (default 80)
hostname string Yes (gRPC) Knative host header for routing
model string Yes Model name
input_data JSON/protobuf Yes Prediction input data

Outputs

Name Type Description
predictions (V1) JSON {"predictions": [...]} — model output values
outputs (V2) JSON {"outputs": [...]} — tensor-typed output values
PredictResponse (gRPC) protobuf TensorFlow Serving PredictResponse

Usage Examples

REST V1 Prediction

# Determine ingress endpoint
INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice flower-sample \
  -o jsonpath='{.status.url}' | cut -d "/" -f 3)

# Send prediction
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
  http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/flower-sample:predict \
  -d '{"instances": [{"image_bytes": {"b64": "iVBOR..."}, "key": "1"}]}'

gRPC Prediction

from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow_serving.apis import predict_pb2
import grpc
import json

# Connect to ingress
channel = grpc.insecure_channel(f"{host}:{port}")
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# Build request
request = predict_pb2.PredictRequest()
request.model_spec.name = "flower-sample"
request.model_spec.signature_name = "serving_default"

# Send with host header override
metadata = [("host", hostname)]
response = stub.Predict(request, 30.0, metadata=metadata)
print(response)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment