Implementation:Kserve Kserve OpenVINO Runtime

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, Model Serving
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete ClusterServingRuntime for Intel OpenVINO Model Server inference provided by the KServe project.

Description

This file defines a ClusterServingRuntime named kserve-openvino using Intel's OpenVINO Model Server for serving models with optimized Intel CPU/GPU inference. It supports multiple model formats: openvino (version 10, priority 1), onnx (version 1, priority 2), tensorflow (version 1, priority 3), and huggingface (no priority set). The runtime supports v1, v2, and grpc-v2 protocols, serves on REST port 8080 and gRPC port 9000, and runs as non-root user 5000 with strict security context. It is managed by Helm with appropriate annotations and resource limits of 1 CPU and 2Gi memory.

Usage

This ClusterServingRuntime is applied cluster-wide and auto-selected for openvino models (priority 1), onnx models (priority 2), and tensorflow models (priority 3). It is particularly useful for deployments targeting Intel hardware acceleration. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime based on format and priority.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/runtimes/kserve-openvino.yaml
Lines: 1-57

Signature

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  annotations:
    meta.helm.sh/release-name: kserve
    meta.helm.sh/release-namespace: kserve
  labels:
    app.kubernetes.io/managed-by: Helm
  name: kserve-openvino
spec:
  annotations:
    prometheus.kserve.io/path: /metrics
    prometheus.kserve.io/port: "8080"
  containers:
  - args:
    - --model_name={{.Name}}
    - --model_path=/mnt/models
    - --port=9000
    - --rest_port=8080
    - --file_system_poll_wait_seconds=0
    image: openvino/model_server:replace
    name: kserve-container
  protocolVersions:
  - v1
  - v2
  - grpc-v2
  supportedModelFormats:
  - autoSelect: true
    name: openvino
    priority: 1
    version: "10"
  - autoSelect: true
    name: onnx
    priority: 2
    version: "1"
  - autoSelect: true
    name: tensorflow
    priority: 3
    version: "1"
  - autoSelect: true
    name: huggingface

Import

kubectl apply -f config/runtimes/kserve-openvino.yaml

I/O Contract

Inputs

Name	Type	Required	Description
Template:.Name	template variable	Yes	Model name injected by KServe at runtime
Model artifacts	storage URI	Yes	Model files at /mnt/models (provided by KServe storage initializer)

Outputs

Name	Type	Description
ClusterServingRuntime	Custom Resource	OpenVINO runtime available cluster-wide for openvino, onnx, tensorflow, and huggingface models
REST inference endpoint	TCP port 8080	V1/V2 inference protocol REST endpoint
gRPC inference endpoint	TCP port 9000	V2/grpc-v2 inference protocol gRPC endpoint
Prometheus metrics	HTTP port 8080 /metrics	Model serving metrics endpoint

Usage Examples

Apply the runtime

kubectl apply -f config/runtimes/kserve-openvino.yaml

Create an OpenVINO InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: openvino-resnet
spec:
  predictor:
    model:
      modelFormat:
        name: openvino
      storageUri: "gs://my-bucket/openvino/resnet"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment