Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kserve Kserve InferenceService CRD Spec

From Leeroopedia
Knowledge Sources
Domains MLOps, Kubernetes, Model_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete Go type definition for the KServe InferenceService custom resource, providing the declarative API for ML model serving on Kubernetes.

Description

The InferenceService CRD is defined in pkg/apis/serving/v1beta1/inference_service.go. It is the primary API object users interact with. The spec contains three optional component specs (predictor, transformer, explainer), each with framework-specific configurations, storage URIs, and Kubernetes resource requirements.

Usage

Use this API when writing InferenceService YAML manifests. The CRD supports shorthand framework names (e.g., tensorflow, sklearn) that the defaulting webhook converts to the canonical model field with modelFormat.

Code Reference

Source Location

  • Repository: kserve
  • File: pkg/apis/serving/v1beta1/inference_service.go, Lines 24-145

Signature

// InferenceService is the top-level type for KServe model serving
type InferenceService struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec              InferenceServiceSpec   `json:"spec,omitempty"`
    Status            InferenceServiceStatus `json:"status,omitempty"`
}

// InferenceServiceSpec defines the desired state
type InferenceServiceSpec struct {
    Predictor   PredictorSpec    `json:"predictor"`
    Transformer *TransformerSpec `json:"transformer,omitempty"`
    Explainer   *ExplainerSpec   `json:"explainer,omitempty"`
}

Import

import servingv1beta1 "github.com/kserve/kserve/pkg/apis/serving/v1beta1"

I/O Contract

Inputs

Name Type Required Description
spec.predictor PredictorSpec Yes Model server configuration with framework, storageUri, and resources
spec.predictor.model.modelFormat.name string Yes (canonical) Framework name: tensorflow, sklearn, xgboost, pytorch, huggingface, etc.
spec.predictor.model.storageUri string Yes Model artifact URI: s3://, gs://, hf://, pvc://
spec.predictor.serviceAccountName string No ServiceAccount with storage credentials
spec.predictor.resources ResourceRequirements No CPU/memory/GPU limits (defaults from ConfigMap)
spec.transformer TransformerSpec No Optional pre/post-processing component
spec.explainer ExplainerSpec No Optional model explainability component

Outputs

Name Type Description
status.url *apis.URL External prediction endpoint URL
status.address *duckv1.Addressable Cluster-internal address
status.conditions []Condition Ready, PredictorReady, IngressReady conditions
status.components map[ComponentType]ComponentStatusSpec Per-component URLs and traffic info

Usage Examples

Basic TensorFlow InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: flower-sample
spec:
  predictor:
    tensorflow:
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"

HuggingFace LLM InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: huggingface-llama3
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      storageUri: "hf://meta-llama/Meta-Llama-3-8B-Instruct"
      resources:
        limits:
          cpu: "6"
          memory: 24Gi
          nvidia.com/gpu: "1"
    serviceAccountName: hfserviceacc

Canonical Form with ModelFormat

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment