Implementation:Kserve Kserve LocalModelCache CRD

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, Model Caching
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete CustomResourceDefinition for the LocalModelCache resource provided by the KServe project.

Description

This file defines the full CRD for the LocalModelCache custom resource, which manages the lifecycle of pre-cached ML models on cluster nodes. It is a cluster-scoped v1alpha1 resource with spec fields for modelSize, nodeGroups, and sourceModelUri (which is immutable once set), and status fields that track copy counts (available, failed, total), associated InferenceServices, and per-node download status. This CRD enables operators to declaratively specify which models should be pre-cached on which node groups, reducing cold-start latency for inference serving.

Usage

Apply this CRD to a Kubernetes cluster before creating any LocalModelCache resources. This is required as a prerequisite for the local model caching subsystem in KServe, which pre-downloads model artifacts to designated node groups so that inference workloads can start faster without fetching models from remote storage at serve time.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml
Lines: 1-86

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: localmodelcaches.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LocalModelCache
    listKind: LocalModelCacheList
    plural: localmodelcaches
    singular: localmodelcache
  scope: Cluster
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              modelSize:
                x-kubernetes-int-or-string: true
              nodeGroups:
                items:
                  type: string
                minItems: 1
              sourceModelUri:
                type: string
            required:
            - modelSize
            - nodeGroups
            - sourceModelUri
          status:
            properties:
              copies:
                properties:
                  available:
                    type: integer
                  failed:
                    type: integer
                  total:
                    type: integer
              nodeStatus:
                additionalProperties:
                  enum:
                  - ""
                  - NodeNotReady
                  - NodeDownloadPending
                  - NodeDownloading
                  - NodeDownloaded
                  - NodeDownloadError

Import

kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml

I/O Contract

Inputs

Name	Type	Required	Description
spec.modelSize	integer or string (Quantity)	Yes	The size of the model to be cached, used for storage capacity planning
spec.nodeGroups	array of strings	Yes	List of node groups (minimum 1) on which to cache the model
spec.sourceModelUri	string	Yes	The URI of the model source; immutable once set

Outputs

Name	Type	Description
LocalModelCache CRD	CustomResourceDefinition	Registers the LocalModelCache resource type in the Kubernetes API server
status.copies	object	Tracks available, failed, and total copy counts across nodes
status.nodeStatus	map of string	Per-node download status (NodeDownloadPending, NodeDownloading, NodeDownloaded, NodeDownloadError)
status.inferenceServices	array	List of InferenceServices associated with this cached model

Usage Examples

Apply the CRD

kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml

Create a LocalModelCache resource

apiVersion: serving.kserve.io/v1alpha1
kind: LocalModelCache
metadata:
  name: my-model-cache
spec:
  modelSize: "10Gi"
  nodeGroups:
    - gpu-workers
  sourceModelUri: "gs://my-bucket/my-model"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment