Implementation:Kserve Kserve OpenVINO Runtime
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, Model Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete ClusterServingRuntime for Intel OpenVINO Model Server inference provided by the KServe project.
Description
This file defines a ClusterServingRuntime named kserve-openvino using Intel's OpenVINO Model Server for serving models with optimized Intel CPU/GPU inference. It supports multiple model formats: openvino (version 10, priority 1), onnx (version 1, priority 2), tensorflow (version 1, priority 3), and huggingface (no priority set). The runtime supports v1, v2, and grpc-v2 protocols, serves on REST port 8080 and gRPC port 9000, and runs as non-root user 5000 with strict security context. It is managed by Helm with appropriate annotations and resource limits of 1 CPU and 2Gi memory.
Usage
This ClusterServingRuntime is applied cluster-wide and auto-selected for openvino models (priority 1), onnx models (priority 2), and tensorflow models (priority 3). It is particularly useful for deployments targeting Intel hardware acceleration. Users create InferenceService resources with the appropriate model format and KServe automatically selects this runtime based on format and priority.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/runtimes/kserve-openvino.yaml
- Lines: 1-57
Signature
apiVersion: serving.kserve.io/v1alpha1
kind: ClusterServingRuntime
metadata:
annotations:
meta.helm.sh/release-name: kserve
meta.helm.sh/release-namespace: kserve
labels:
app.kubernetes.io/managed-by: Helm
name: kserve-openvino
spec:
annotations:
prometheus.kserve.io/path: /metrics
prometheus.kserve.io/port: "8080"
containers:
- args:
- --model_name={{.Name}}
- --model_path=/mnt/models
- --port=9000
- --rest_port=8080
- --file_system_poll_wait_seconds=0
image: openvino/model_server:replace
name: kserve-container
protocolVersions:
- v1
- v2
- grpc-v2
supportedModelFormats:
- autoSelect: true
name: openvino
priority: 1
version: "10"
- autoSelect: true
name: onnx
priority: 2
version: "1"
- autoSelect: true
name: tensorflow
priority: 3
version: "1"
- autoSelect: true
name: huggingface
Import
kubectl apply -f config/runtimes/kserve-openvino.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Template:.Name | template variable | Yes | Model name injected by KServe at runtime |
| Model artifacts | storage URI | Yes | Model files at /mnt/models (provided by KServe storage initializer) |
Outputs
| Name | Type | Description |
|---|---|---|
| ClusterServingRuntime | Custom Resource | OpenVINO runtime available cluster-wide for openvino, onnx, tensorflow, and huggingface models |
| REST inference endpoint | TCP port 8080 | V1/V2 inference protocol REST endpoint |
| gRPC inference endpoint | TCP port 9000 | V2/grpc-v2 inference protocol gRPC endpoint |
| Prometheus metrics | HTTP port 8080 /metrics | Model serving metrics endpoint |
Usage Examples
Apply the runtime
kubectl apply -f config/runtimes/kserve-openvino.yaml
Create an OpenVINO InferenceService
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: openvino-resnet
spec:
predictor:
model:
modelFormat:
name: openvino
storageUri: "gs://my-bucket/openvino/resnet"