Implementation:Kserve Kserve MLServer Runtime
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, Model Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete ClusterServingRuntime for Seldon MLServer multi-framework inference provided by the KServe project.
Description
This file defines a ClusterServingRuntime named kserve-mlserver using Seldon's MLServer as a multi-framework inference server. It supports multiple model formats including sklearn (versions 0 and 1), xgboost (versions 1 and 2), lightgbm (versions 3 and 4) at priority 3, and mlflow (versions 1 and 2) at priority 1. The runtime uses the v2 inference protocol exclusively, configures the model class via the Template:.Labels.modelClass template, and serves on HTTP port 8080 and gRPC port 9000. It runs with non-root security context and resource limits of 1 CPU and 2Gi memory.
Usage
This ClusterServingRuntime is applied cluster-wide and auto-selected for sklearn, xgboost, lightgbm, and mlflow models. It acts as a fallback (priority 3) for common ML frameworks while being the primary (priority 1) runtime for MLflow models. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/runtimes/kserve-mlserver.yaml
- Lines: 1-70
Signature
apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
name: kserve-mlserver
spec:
annotations:
prometheus.kserve.io/port: '8080'
prometheus.kserve.io/path: "/metrics"
supportedModelFormats:
- name: sklearn
version: "0"
autoSelect: true
priority: 3
- name: sklearn
version: "1"
autoSelect: true
priority: 3
- name: xgboost
version: "1"
autoSelect: true
priority: 3
- name: xgboost
version: "2"
autoSelect: true
priority: 3
- name: lightgbm
version: "3"
autoSelect: true
priority: 3
- name: lightgbm
version: "4"
autoSelect: true
priority: 3
- name: mlflow
version: "1"
autoSelect: true
priority: 1
- name: mlflow
version: "2"
autoSelect: true
priority: 1
protocolVersions:
- v2
containers:
- name: kserve-container
image: mlserver:replace
env:
- name: "MLSERVER_MODEL_IMPLEMENTATION"
value: "{{.Labels.modelClass}}"
- name: "MLSERVER_HTTP_PORT"
value: "8080"
- name: "MLSERVER_GRPC_PORT"
value: "9000"
- name: "MODELS_DIR"
value: "/mnt/models"
Import
kubectl apply -f config/runtimes/kserve-mlserver.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Template:.Labels.modelClass | template variable | Yes | Model implementation class injected from InferenceService labels |
| Model artifacts | storage URI | Yes | Model files at /mnt/models (provided by KServe storage initializer) |
Outputs
| Name | Type | Description |
|---|---|---|
| ClusterServingRuntime | Custom Resource | MLServer runtime available cluster-wide for sklearn, xgboost, lightgbm, and mlflow models |
| HTTP inference endpoint | TCP port 8080 | V2 inference protocol HTTP endpoint |
| gRPC inference endpoint | TCP port 9000 | V2 inference protocol gRPC endpoint |
| Prometheus metrics | HTTP port 8080 /metrics | Model serving metrics endpoint |
Usage Examples
Apply the runtime
kubectl apply -f config/runtimes/kserve-mlserver.yaml
Create an sklearn InferenceService
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-iris
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://my-bucket/sklearn/iris"