Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve MLServer Runtime

From Leeroopedia
Knowledge Sources
Domains Kubernetes, Model Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete ClusterServingRuntime for Seldon MLServer multi-framework inference provided by the KServe project.

Description

This file defines a ClusterServingRuntime named kserve-mlserver using Seldon's MLServer as a multi-framework inference server. It supports multiple model formats including sklearn (versions 0 and 1), xgboost (versions 1 and 2), lightgbm (versions 3 and 4) at priority 3, and mlflow (versions 1 and 2) at priority 1. The runtime uses the v2 inference protocol exclusively, configures the model class via the Template:.Labels.modelClass template, and serves on HTTP port 8080 and gRPC port 9000. It runs with non-root security context and resource limits of 1 CPU and 2Gi memory.

Usage

This ClusterServingRuntime is applied cluster-wide and auto-selected for sklearn, xgboost, lightgbm, and mlflow models. It acts as a fallback (priority 3) for common ML frameworks while being the primary (priority 1) runtime for MLflow models. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime.

Code Reference

Source Location

Signature

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  name: kserve-mlserver
spec:
  annotations:
    prometheus.kserve.io/port: '8080'
    prometheus.kserve.io/path: "/metrics"
  supportedModelFormats:
    - name: sklearn
      version: "0"
      autoSelect: true
      priority: 3
    - name: sklearn
      version: "1"
      autoSelect: true
      priority: 3
    - name: xgboost
      version: "1"
      autoSelect: true
      priority: 3
    - name: xgboost
      version: "2"
      autoSelect: true
      priority: 3
    - name: lightgbm
      version: "3"
      autoSelect: true
      priority: 3
    - name: lightgbm
      version: "4"
      autoSelect: true
      priority: 3
    - name: mlflow
      version: "1"
      autoSelect: true
      priority: 1
    - name: mlflow
      version: "2"
      autoSelect: true
      priority: 1
  protocolVersions:
    - v2
  containers:
    - name: kserve-container
      image: mlserver:replace
      env:
        - name: "MLSERVER_MODEL_IMPLEMENTATION"
          value: "{{.Labels.modelClass}}"
        - name: "MLSERVER_HTTP_PORT"
          value: "8080"
        - name: "MLSERVER_GRPC_PORT"
          value: "9000"
        - name: "MODELS_DIR"
          value: "/mnt/models"

Import

kubectl apply -f config/runtimes/kserve-mlserver.yaml

I/O Contract

Inputs

Name Type Required Description
Template:.Labels.modelClass template variable Yes Model implementation class injected from InferenceService labels
Model artifacts storage URI Yes Model files at /mnt/models (provided by KServe storage initializer)

Outputs

Name Type Description
ClusterServingRuntime Custom Resource MLServer runtime available cluster-wide for sklearn, xgboost, lightgbm, and mlflow models
HTTP inference endpoint TCP port 8080 V2 inference protocol HTTP endpoint
gRPC inference endpoint TCP port 9000 V2 inference protocol gRPC endpoint
Prometheus metrics HTTP port 8080 /metrics Model serving metrics endpoint

Usage Examples

Apply the runtime

kubectl apply -f config/runtimes/kserve-mlserver.yaml

Create an sklearn InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://my-bucket/sklearn/iris"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment