Principle:Kserve Kserve Multi Model Prediction
| Knowledge Sources | |
|---|---|
| Domains | Model_Serving, API_Design, Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A routing pattern that serves multiple models from a single InferenceService host, using per-model URL paths to direct requests to the correct loaded model.
Description
In multi-model serving, each TrainedModel gets its own prediction endpoint under the shared InferenceService host. The URL path includes the TrainedModel name, and the model server routes the request to the correct loaded model.
URL pattern: /v1/models/<trained-model-name>:predict (V1) or /v2/models/<trained-model-name>/infer (V2).
Usage
Send prediction requests to the per-model endpoints after TrainedModels are loaded. Each model is independently addressable despite sharing the same pod.
Theoretical Basis
# MMS prediction routing (NOT implementation code)
Host: sklearn-iris-example.<namespace>.example.com
Per-model endpoints:
/v1/models/model1-sklearn:predict → model1-sklearn in shared pod
/v1/models/model2-sklearn:predict → model2-sklearn in shared pod
URL generation: PredictPath(trainedModelName, protocol)
V1: /v1/models/<name>:predict
V2: /v2/models/<name>/infer