Implementation:Kserve Kserve MLServer Runtime

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, Model Serving
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete ClusterServingRuntime for Seldon MLServer multi-framework inference provided by the KServe project.

Description

This file defines a ClusterServingRuntime named kserve-mlserver using Seldon's MLServer as a multi-framework inference server. It supports multiple model formats including sklearn (versions 0 and 1), xgboost (versions 1 and 2), lightgbm (versions 3 and 4) at priority 3, and mlflow (versions 1 and 2) at priority 1. The runtime uses the v2 inference protocol exclusively, configures the model class via the Template:.Labels.modelClass template, and serves on HTTP port 8080 and gRPC port 9000. It runs with non-root security context and resource limits of 1 CPU and 2Gi memory.

Usage

This ClusterServingRuntime is applied cluster-wide and auto-selected for sklearn, xgboost, lightgbm, and mlflow models. It acts as a fallback (priority 3) for common ML frameworks while being the primary (priority 1) runtime for MLflow models. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/runtimes/kserve-mlserver.yaml
Lines: 1-70

Signature

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  name: kserve-mlserver
spec:
  annotations:
    prometheus.kserve.io/port: '8080'
    prometheus.kserve.io/path: "/metrics"
  supportedModelFormats:
    - name: sklearn
      version: "0"
      autoSelect: true
      priority: 3
    - name: sklearn
      version: "1"
      autoSelect: true
      priority: 3
    - name: xgboost
      version: "1"
      autoSelect: true
      priority: 3
    - name: xgboost
      version: "2"
      autoSelect: true
      priority: 3
    - name: lightgbm
      version: "3"
      autoSelect: true
      priority: 3
    - name: lightgbm
      version: "4"
      autoSelect: true
      priority: 3
    - name: mlflow
      version: "1"
      autoSelect: true
      priority: 1
    - name: mlflow
      version: "2"
      autoSelect: true
      priority: 1
  protocolVersions:
    - v2
  containers:
    - name: kserve-container
      image: mlserver:replace
      env:
        - name: "MLSERVER_MODEL_IMPLEMENTATION"
          value: "{{.Labels.modelClass}}"
        - name: "MLSERVER_HTTP_PORT"
          value: "8080"
        - name: "MLSERVER_GRPC_PORT"
          value: "9000"
        - name: "MODELS_DIR"
          value: "/mnt/models"

Import

kubectl apply -f config/runtimes/kserve-mlserver.yaml

I/O Contract

Inputs

Name	Type	Required	Description
Template:.Labels.modelClass	template variable	Yes	Model implementation class injected from InferenceService labels
Model artifacts	storage URI	Yes	Model files at /mnt/models (provided by KServe storage initializer)

Outputs

Name	Type	Description
ClusterServingRuntime	Custom Resource	MLServer runtime available cluster-wide for sklearn, xgboost, lightgbm, and mlflow models
HTTP inference endpoint	TCP port 8080	V2 inference protocol HTTP endpoint
gRPC inference endpoint	TCP port 9000	V2 inference protocol gRPC endpoint
Prometheus metrics	HTTP port 8080 /metrics	Model serving metrics endpoint

Usage Examples

Apply the runtime

kubectl apply -f config/runtimes/kserve-mlserver.yaml

Create an sklearn InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://my-bucket/sklearn/iris"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment