Implementation:Kserve Kserve MMS Prediction Routing
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Model_Serving, API_Design, Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete URL path generation and curl-based prediction patterns for per-model endpoints in multi-model serving.
Description
The PredictPath() function in pkg/constants/constants.go generates per-model prediction URL paths. In MMS mode, each TrainedModel name is used as the model name in the URL path. All models share the same host (the parent InferenceService endpoint).
Usage
Use per-model prediction paths after TrainedModels have been loaded by the agent.
Code Reference
Source Location
- Repository: kserve
- File: pkg/constants/constants.go, Lines 687-695
- File: docs/samples/multimodelserving/sklearn/README.md, Lines 146-152
Signature
// PredictPath generates the prediction URL path for a model
func PredictPath(name string, protocol InferenceServiceProtocol) string
// V1: returns "/v1/models/<name>:predict"
// V2: returns "/v2/models/<name>/infer"
Import
import "github.com/kserve/kserve/pkg/constants"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | TrainedModel name |
| protocol | InferenceServiceProtocol | Yes | V1 or V2 |
| Host header | string | Yes | Parent InferenceService hostname |
Outputs
| Name | Type | Description |
|---|---|---|
| URL path | string | /v1/models/<name>:predict or /v2/models/<name>/infer |
| Prediction response | JSON | Model-specific prediction output |
Usage Examples
Predict Against Multiple Models
SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris-example \
-o jsonpath='{.status.url}' | cut -d "/" -f 3)
# Predict model 1
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/model1-sklearn:predict \
-d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
# Response: {"predictions": [1]}
# Predict model 2
curl -v -H "Host: ${SERVICE_HOSTNAME}" \
http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/model2-sklearn:predict \
-d '{"instances": [[6.8, 2.8, 4.8, 1.4]]}'
# Response: {"predictions": [1]}
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment