Implementation:Kserve Kserve InferenceService CRD Spec

Knowledge Sources	KServe KServe API Reference
Domains	MLOps, Kubernetes, Model_Serving
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete Go type definition for the KServe InferenceService custom resource, providing the declarative API for ML model serving on Kubernetes.

Description

The InferenceService CRD is defined in pkg/apis/serving/v1beta1/inference_service.go. It is the primary API object users interact with. The spec contains three optional component specs (predictor, transformer, explainer), each with framework-specific configurations, storage URIs, and Kubernetes resource requirements.

Usage

Use this API when writing InferenceService YAML manifests. The CRD supports shorthand framework names (e.g., tensorflow, sklearn) that the defaulting webhook converts to the canonical model field with modelFormat.

Code Reference

Source Location

Repository: kserve
File: pkg/apis/serving/v1beta1/inference_service.go, Lines 24-145

Signature

// InferenceService is the top-level type for KServe model serving
type InferenceService struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec              InferenceServiceSpec   `json:"spec,omitempty"`
    Status            InferenceServiceStatus `json:"status,omitempty"`
}

// InferenceServiceSpec defines the desired state
type InferenceServiceSpec struct {
    Predictor   PredictorSpec    `json:"predictor"`
    Transformer *TransformerSpec `json:"transformer,omitempty"`
    Explainer   *ExplainerSpec   `json:"explainer,omitempty"`
}

Import

import servingv1beta1 "github.com/kserve/kserve/pkg/apis/serving/v1beta1"

I/O Contract

Inputs

Name	Type	Required	Description
spec.predictor	PredictorSpec	Yes	Model server configuration with framework, storageUri, and resources
spec.predictor.model.modelFormat.name	string	Yes (canonical)	Framework name: tensorflow, sklearn, xgboost, pytorch, huggingface, etc.
spec.predictor.model.storageUri	string	Yes	Model artifact URI: s3://, gs://, hf://, pvc://
spec.predictor.serviceAccountName	string	No	ServiceAccount with storage credentials
spec.predictor.resources	ResourceRequirements	No	CPU/memory/GPU limits (defaults from ConfigMap)
spec.transformer	TransformerSpec	No	Optional pre/post-processing component
spec.explainer	ExplainerSpec	No	Optional model explainability component

Outputs

Name	Type	Description
status.url	*apis.URL	External prediction endpoint URL
status.address	*duckv1.Addressable	Cluster-internal address
status.conditions	[]Condition	Ready, PredictorReady, IngressReady conditions
status.components	map[ComponentType]ComponentStatusSpec	Per-component URLs and traffic info

Usage Examples

Basic TensorFlow InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: flower-sample
spec:
  predictor:
    tensorflow:
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"

HuggingFace LLM InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: huggingface-llama3
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      storageUri: "hf://meta-llama/Meta-Llama-3-8B-Instruct"
      resources:
        limits:
          cpu: "6"
          memory: 24Gi
          nvidia.com/gpu: "1"
    serviceAccountName: hfserviceacc

Canonical Form with ModelFormat

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment