Implementation:Kserve Kserve LocalModelCache CRD
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, Model Caching |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete CustomResourceDefinition for the LocalModelCache resource provided by the KServe project.
Description
This file defines the full CRD for the LocalModelCache custom resource, which manages the lifecycle of pre-cached ML models on cluster nodes. It is a cluster-scoped v1alpha1 resource with spec fields for modelSize, nodeGroups, and sourceModelUri (which is immutable once set), and status fields that track copy counts (available, failed, total), associated InferenceServices, and per-node download status. This CRD enables operators to declaratively specify which models should be pre-cached on which node groups, reducing cold-start latency for inference serving.
Usage
Apply this CRD to a Kubernetes cluster before creating any LocalModelCache resources. This is required as a prerequisite for the local model caching subsystem in KServe, which pre-downloads model artifacts to designated node groups so that inference workloads can start faster without fetching models from remote storage at serve time.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml
- Lines: 1-86
Signature
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: localmodelcaches.serving.kserve.io
spec:
group: serving.kserve.io
names:
kind: LocalModelCache
listKind: LocalModelCacheList
plural: localmodelcaches
singular: localmodelcache
scope: Cluster
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
properties:
spec:
properties:
modelSize:
x-kubernetes-int-or-string: true
nodeGroups:
items:
type: string
minItems: 1
sourceModelUri:
type: string
required:
- modelSize
- nodeGroups
- sourceModelUri
status:
properties:
copies:
properties:
available:
type: integer
failed:
type: integer
total:
type: integer
nodeStatus:
additionalProperties:
enum:
- ""
- NodeNotReady
- NodeDownloadPending
- NodeDownloading
- NodeDownloaded
- NodeDownloadError
Import
kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| spec.modelSize | integer or string (Quantity) | Yes | The size of the model to be cached, used for storage capacity planning |
| spec.nodeGroups | array of strings | Yes | List of node groups (minimum 1) on which to cache the model |
| spec.sourceModelUri | string | Yes | The URI of the model source; immutable once set |
Outputs
| Name | Type | Description |
|---|---|---|
| LocalModelCache CRD | CustomResourceDefinition | Registers the LocalModelCache resource type in the Kubernetes API server |
| status.copies | object | Tracks available, failed, and total copy counts across nodes |
| status.nodeStatus | map of string | Per-node download status (NodeDownloadPending, NodeDownloading, NodeDownloaded, NodeDownloadError) |
| status.inferenceServices | array | List of InferenceServices associated with this cached model |
Usage Examples
Apply the CRD
kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml
Create a LocalModelCache resource
apiVersion: serving.kserve.io/v1alpha1
kind: LocalModelCache
metadata:
name: my-model-cache
spec:
modelSize: "10Gi"
nodeGroups:
- gpu-workers
sourceModelUri: "gs://my-bucket/my-model"