Principle:Kserve Kserve Multi Model Prediction

Knowledge Sources	KServe Multi-Model Serving
Domains	Model_Serving, API_Design, Inference
Last Updated	2026-02-13 00:00 GMT

Overview

A routing pattern that serves multiple models from a single InferenceService host, using per-model URL paths to direct requests to the correct loaded model.

Description

In multi-model serving, each TrainedModel gets its own prediction endpoint under the shared InferenceService host. The URL path includes the TrainedModel name, and the model server routes the request to the correct loaded model.

URL pattern: /v1/models/<trained-model-name>:predict (V1) or /v2/models/<trained-model-name>/infer (V2).

Usage

Send prediction requests to the per-model endpoints after TrainedModels are loaded. Each model is independently addressable despite sharing the same pod.

Theoretical Basis

# MMS prediction routing (NOT implementation code)
Host: sklearn-iris-example.<namespace>.example.com

Per-model endpoints:
  /v1/models/model1-sklearn:predict  → model1-sklearn in shared pod
  /v1/models/model2-sklearn:predict  → model2-sklearn in shared pod

URL generation: PredictPath(trainedModelName, protocol)
  V1: /v1/models/<name>:predict
  V2: /v2/models/<name>/infer

Related Pages

Implemented By

Implementation:Kserve_Kserve_MMS_Prediction_Routing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment