Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve OpenVINO Runtime

From Leeroopedia
Knowledge Sources
Domains Kubernetes, Model Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete ClusterServingRuntime for Intel OpenVINO Model Server inference provided by the KServe project.

Description

This file defines a ClusterServingRuntime named kserve-openvino using Intel's OpenVINO Model Server for serving models with optimized Intel CPU/GPU inference. It supports multiple model formats: openvino (version 10, priority 1), onnx (version 1, priority 2), tensorflow (version 1, priority 3), and huggingface (no priority set). The runtime supports v1, v2, and grpc-v2 protocols, serves on REST port 8080 and gRPC port 9000, and runs as non-root user 5000 with strict security context. It is managed by Helm with appropriate annotations and resource limits of 1 CPU and 2Gi memory.

Usage

This ClusterServingRuntime is applied cluster-wide and auto-selected for openvino models (priority 1), onnx models (priority 2), and tensorflow models (priority 3). It is particularly useful for deployments targeting Intel hardware acceleration. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime based on format and priority.

Code Reference

Source Location

Signature

apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
  annotations:
    meta.helm.sh/release-name: kserve
    meta.helm.sh/release-namespace: kserve
  labels:
    app.kubernetes.io/managed-by: Helm
  name: kserve-openvino
spec:
  annotations:
    prometheus.kserve.io/path: /metrics
    prometheus.kserve.io/port: "8080"
  containers:
  - args:
    - --model_name={{.Name}}
    - --model_path=/mnt/models
    - --port=9000
    - --rest_port=8080
    - --file_system_poll_wait_seconds=0
    image: openvino/model_server:replace
    name: kserve-container
  protocolVersions:
  - v1
  - v2
  - grpc-v2
  supportedModelFormats:
  - autoSelect: true
    name: openvino
    priority: 1
    version: "10"
  - autoSelect: true
    name: onnx
    priority: 2
    version: "1"
  - autoSelect: true
    name: tensorflow
    priority: 3
    version: "1"
  - autoSelect: true
    name: huggingface

Import

kubectl apply -f config/runtimes/kserve-openvino.yaml

I/O Contract

Inputs

Name Type Required Description
Template:.Name template variable Yes Model name injected by KServe at runtime
Model artifacts storage URI Yes Model files at /mnt/models (provided by KServe storage initializer)

Outputs

Name Type Description
ClusterServingRuntime Custom Resource OpenVINO runtime available cluster-wide for openvino, onnx, tensorflow, and huggingface models
REST inference endpoint TCP port 8080 V1/V2 inference protocol REST endpoint
gRPC inference endpoint TCP port 9000 V2/grpc-v2 inference protocol gRPC endpoint
Prometheus metrics HTTP port 8080 /metrics Model serving metrics endpoint

Usage Examples

Apply the runtime

kubectl apply -f config/runtimes/kserve-openvino.yaml

Create an OpenVINO InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: openvino-resnet
spec:
  predictor:
    model:
      modelFormat:
        name: openvino
      storageUri: "gs://my-bucket/openvino/resnet"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment