Implementation:Kserve Kserve TorchServe gRPC Client

Knowledge Sources	Kserve_Kserve
Domains	gRPC, TorchServe
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for interacting with TorchServe inference and management gRPC APIs for predictions, model registration, and model lifecycle management provided by the KServe sample code.

Description

This module implements a comprehensive gRPC client for TorchServe with the following functions:

get_inference_stub() -- Creates a gRPC channel to the inference API and returns an InferenceAPIsServiceStub for making prediction and health check calls.
get_management_stub() -- Creates a gRPC channel to the management API and returns a ManagementAPIsServiceStub for model registration and lifecycle operations.
infer() -- Reads a binary input file and sends a PredictionsRequest to the inference stub, printing the decoded prediction result.
ping() -- Sends a health check (TorchServeHealthResponse) via the inference stub and prints the status.
register() -- Registers a model with TorchServe by submitting a RegisterModelRequest with the MAR file URL, initial workers, and model name. Checks a provided set of available MAR files and falls back to the TorchServe S3 URL if not found locally.
unregister() -- Unregisters a model from TorchServe by name.

The __main__ block provides CLI argument parsing for host, port, hostname, model name, API name (infer or ping), and input path.

Usage

Use this script as a gRPC client to interact with TorchServe models deployed on KServe, supporting inference requests, health checks, model registration, and model unregistration.

Code Reference

Source Location

Repository: Kserve_Kserve
File: docs/samples/v1beta1/torchserve/v1/torchserve_grpc_client.py
Lines: 1-153

Signature

def get_inference_stub(host, port, hostname):
    ...

def get_management_stub(host, port, hostname):
    ...

def infer(stub, model_name, model_input):
    ...

def ping(stub):
    ...

def register(stub, model_name, mar_set_str):
    ...

def unregister(stub, model_name):
    ...

Import

from torchserve_grpc_client import get_inference_stub, infer, ping, register, unregister

I/O Contract

Inputs

get_inference_stub()

Name	Type	Required	Description
host	str	Yes	Ingress host name or IP address
port	int	Yes	Ingress port number
hostname	str	Yes	Service host name for gRPC SSL target name override

get_management_stub()

Name	Type	Required	Description
host	str	Yes	Ingress host name or IP address
port	int	Yes	Ingress port number
hostname	str	Yes	Service host name for gRPC SSL target name override

infer()

Name	Type	Required	Description
stub	InferenceAPIsServiceStub	Yes	The gRPC inference stub
model_name	str	Yes	Name of the TorchServe model to query
model_input	str	Yes	File path to the binary input data

ping()

Name	Type	Required	Description
stub	InferenceAPIsServiceStub	Yes	The gRPC inference stub for the health check

register()

Name	Type	Required	Description
stub	ManagementAPIsServiceStub	Yes	The gRPC management stub
model_name	str	Yes	Name of the model to register
mar_set_str	str	No	Comma-separated string of available MAR filenames

unregister()

Name	Type	Required	Description
stub	ManagementAPIsServiceStub	Yes	The gRPC management stub
model_name	str	Yes	Name of the model to unregister

Outputs

get_inference_stub()

Name	Type	Description
stub	InferenceAPIsServiceStub	gRPC stub for making inference calls

get_management_stub()

Name	Type	Description
stub	ManagementAPIsServiceStub	gRPC stub for making management calls

infer()

Name	Type	Description
(none)	None	Prints the prediction result to stdout; exits with code 1 on gRPC error

ping()

Name	Type	Description
(none)	None	Prints the health response to stdout; exits with code 1 on gRPC error

Usage Examples

Basic Usage

from torchserve_grpc_client import get_inference_stub, get_management_stub, infer, ping, register

# Create stubs
inference_stub = get_inference_stub("localhost", 80, "torchserve.default.example.com")
management_stub = get_management_stub("localhost", 80, "torchserve.default.example.com")

# Health check
ping(inference_stub)

# Register a model
register(management_stub, "mnist", None)

# Run inference
infer(inference_stub, "mnist", "test_data/mnist_input.json")

CLI Usage

# Run inference via command line:
# python torchserve_grpc_client.py \
#     --host localhost \
#     --port 80 \
#     --hostname torchserve.default.example.com \
#     --model mnist \
#     --api_name infer \
#     --input_path mnist.json

# Run health check:
# python torchserve_grpc_client.py \
#     --host localhost \
#     --port 80 \
#     --hostname torchserve.default.example.com \
#     --api_name ping

Related Pages

Environment:Kserve_Kserve_Python_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment