Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve InferenceService Full CRD

From Leeroopedia
Knowledge Sources
Domains Kubernetes, CRD, Model Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete CRD definition for the InferenceService custom resource in the KServe serving API, providing full OpenAPI v3 schema validation.

Description

This file contains the auto-generated full CustomResourceDefinition for the InferenceService kind (short name: isvc), produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1beta1 and is a namespaced resource. This is the core CRD of the entire KServe platform, defining the API through which users deploy, version, and manage ML model serving endpoints with traffic splitting, autoscaling, and canary rollouts. The CRD includes printer columns for URL, Ready status, Prev/Latest traffic percentages, revision names, and Age, and defines a status subresource with x-kubernetes-preserve-unknown-fields for extensibility.

Usage

Apply this CRD during KServe installation to register the InferenceService API with the Kubernetes API server. Once registered, users can create InferenceService resources to deploy predictor, transformer, and explainer components for serving ML models, with full schema validation enforced by the API server.

Code Reference

Source Location

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: inferenceservices.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: InferenceService
    listKind: InferenceServiceList
    plural: inferenceservices
    shortNames:
      - isvc
    singular: inferenceservice
  scope: Namespaced
  versions:
    - name: v1beta1
      additionalPrinterColumns:
        - jsonPath: .status.url
          name: URL
          type: string
        - jsonPath: .status.conditions[?(@.type=='Ready')].status
          name: Ready
          type: string
        - jsonPath: .status.components.predictor.traffic[?(@.tag=='prev')].percent
          name: Prev
          type: integer
        - jsonPath: .status.components.predictor.traffic[?(@.latestRevision==true)].percent
          name: Latest
          type: integer
        - jsonPath: .status.components.predictor.traffic[?(@.tag=='prev')].revisionName
          name: PrevRolledoutRevision
          type: string
        - jsonPath: .status.components.predictor.traffic[?(@.latestRevision==true)].revisionName
          name: LatestReadyRevision
          type: string
        - jsonPath: .metadata.creationTimestamp
          name: Age
          type: date
      subresources:
        status: {}

Import

kubectl apply -f config/crd/full/serving.kserve.io_inferenceservices.yaml

I/O Contract

Inputs

Name Type Required Description
apiVersion string Yes Must be serving.kserve.io/v1beta1
kind string Yes Must be InferenceService
metadata ObjectMeta Yes Standard Kubernetes object metadata
spec InferenceServiceSpec Yes Service specification defining predictor, transformer, and explainer components

Key spec fields:

Field Type Required Description
spec.predictor PredictorSpec Yes (implicit) Predictor component specification with full pod template support, including built-in framework support (sklearn, tensorflow, pytorch, xgboost, triton, etc.) and custom container definitions
spec.transformer TransformerSpec No Optional pre/post-processing transformer component with full pod template support
spec.explainer ExplainerSpec No Optional model explainability component with full pod template support

Each component (predictor, transformer, explainer) supports the full Kubernetes PodSpec including:

  • Container definitions with image, command, args, env, resources, ports, volume mounts
  • Affinity, tolerations, topology spread constraints
  • Init containers, service account, security context
  • Active deadline seconds, DNS policy, host networking

Outputs

Name Type Description
status.url string Primary URL endpoint for the inference service
status.conditions []Condition Knative-style conditions including Ready status
status.components ComponentStatuses Per-component status with traffic routing, revision names, and URLs for predictor, transformer, and explainer
status.modelStatus ModelStatus Model-level status including active/target model state (Pending, Standby, Loading, Loaded, FailedToLoad), transition status, and failure info with reason enums (ModelLoadFailed, RuntimeUnhealthy, RuntimeDisabled, NoSupportingRuntime, RuntimeNotRecognized, InvalidPredictorSpec)
status.observedGeneration int64 The generation most recently observed by the controller
status.servingRuntimeName string Name of the serving runtime selected for this inference service

Usage Examples

Create an InferenceService

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: sklearn-iris
  namespace: default
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      storageUri: gs://kfserving-examples/models/sklearn/1.0/model
      resources:
        requests:
          cpu: 100m
          memory: 256Mi

Check Status with Traffic Split

kubectl get isvc sklearn-iris
# NAME          URL                                        READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION          AGE
# sklearn-iris  http://sklearn-iris.default.example.com    True           100                              sklearn-iris-predictor-001    5m

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment