Implementation:Kserve Kserve InferenceService CRD Spec
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Kubernetes, Model_Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete Go type definition for the KServe InferenceService custom resource, providing the declarative API for ML model serving on Kubernetes.
Description
The InferenceService CRD is defined in pkg/apis/serving/v1beta1/inference_service.go. It is the primary API object users interact with. The spec contains three optional component specs (predictor, transformer, explainer), each with framework-specific configurations, storage URIs, and Kubernetes resource requirements.
Usage
Use this API when writing InferenceService YAML manifests. The CRD supports shorthand framework names (e.g., tensorflow, sklearn) that the defaulting webhook converts to the canonical model field with modelFormat.
Code Reference
Source Location
- Repository: kserve
- File: pkg/apis/serving/v1beta1/inference_service.go, Lines 24-145
Signature
// InferenceService is the top-level type for KServe model serving
type InferenceService struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec InferenceServiceSpec `json:"spec,omitempty"`
Status InferenceServiceStatus `json:"status,omitempty"`
}
// InferenceServiceSpec defines the desired state
type InferenceServiceSpec struct {
Predictor PredictorSpec `json:"predictor"`
Transformer *TransformerSpec `json:"transformer,omitempty"`
Explainer *ExplainerSpec `json:"explainer,omitempty"`
}
Import
import servingv1beta1 "github.com/kserve/kserve/pkg/apis/serving/v1beta1"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec.predictor | PredictorSpec | Yes | Model server configuration with framework, storageUri, and resources |
| spec.predictor.model.modelFormat.name | string | Yes (canonical) | Framework name: tensorflow, sklearn, xgboost, pytorch, huggingface, etc. |
| spec.predictor.model.storageUri | string | Yes | Model artifact URI: s3://, gs://, hf://, pvc:// |
| spec.predictor.serviceAccountName | string | No | ServiceAccount with storage credentials |
| spec.predictor.resources | ResourceRequirements | No | CPU/memory/GPU limits (defaults from ConfigMap) |
| spec.transformer | TransformerSpec | No | Optional pre/post-processing component |
| spec.explainer | ExplainerSpec | No | Optional model explainability component |
Outputs
| Name | Type | Description |
|---|---|---|
| status.url | *apis.URL | External prediction endpoint URL |
| status.address | *duckv1.Addressable | Cluster-internal address |
| status.conditions | []Condition | Ready, PredictorReady, IngressReady conditions |
| status.components | map[ComponentType]ComponentStatusSpec | Per-component URLs and traffic info |
Usage Examples
Basic TensorFlow InferenceService
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: flower-sample
spec:
predictor:
tensorflow:
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
HuggingFace LLM InferenceService
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: huggingface-llama3
spec:
predictor:
model:
modelFormat:
name: huggingface
storageUri: "hf://meta-llama/Meta-Llama-3-8B-Instruct"
resources:
limits:
cpu: "6"
memory: 24Gi
nvidia.com/gpu: "1"
serviceAccountName: hfserviceacc
Canonical Form with ModelFormat
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-iris
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"