Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Kserve Kserve Multi Model Prediction

From Leeroopedia
Knowledge Sources
Domains Model_Serving, API_Design, Inference
Last Updated 2026-02-13 00:00 GMT

Overview

A routing pattern that serves multiple models from a single InferenceService host, using per-model URL paths to direct requests to the correct loaded model.

Description

In multi-model serving, each TrainedModel gets its own prediction endpoint under the shared InferenceService host. The URL path includes the TrainedModel name, and the model server routes the request to the correct loaded model.

URL pattern: /v1/models/<trained-model-name>:predict (V1) or /v2/models/<trained-model-name>/infer (V2).

Usage

Send prediction requests to the per-model endpoints after TrainedModels are loaded. Each model is independently addressable despite sharing the same pod.

Theoretical Basis

# MMS prediction routing (NOT implementation code)
Host: sklearn-iris-example.<namespace>.example.com

Per-model endpoints:
  /v1/models/model1-sklearn:predict  → model1-sklearn in shared pod
  /v1/models/model2-sklearn:predict  → model2-sklearn in shared pod

URL generation: PredictPath(trainedModelName, protocol)
  V1: /v1/models/<name>:predict
  V2: /v2/models/<name>/infer

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment