Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve LocalModelCache CRD

From Leeroopedia
Knowledge Sources
Domains Kubernetes, Model Caching
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete CustomResourceDefinition for the LocalModelCache resource provided by the KServe project.

Description

This file defines the full CRD for the LocalModelCache custom resource, which manages the lifecycle of pre-cached ML models on cluster nodes. It is a cluster-scoped v1alpha1 resource with spec fields for modelSize, nodeGroups, and sourceModelUri (which is immutable once set), and status fields that track copy counts (available, failed, total), associated InferenceServices, and per-node download status. This CRD enables operators to declaratively specify which models should be pre-cached on which node groups, reducing cold-start latency for inference serving.

Usage

Apply this CRD to a Kubernetes cluster before creating any LocalModelCache resources. This is required as a prerequisite for the local model caching subsystem in KServe, which pre-downloads model artifacts to designated node groups so that inference workloads can start faster without fetching models from remote storage at serve time.

Code Reference

Source Location

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: localmodelcaches.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LocalModelCache
    listKind: LocalModelCacheList
    plural: localmodelcaches
    singular: localmodelcache
  scope: Cluster
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        properties:
          spec:
            properties:
              modelSize:
                x-kubernetes-int-or-string: true
              nodeGroups:
                items:
                  type: string
                minItems: 1
              sourceModelUri:
                type: string
            required:
            - modelSize
            - nodeGroups
            - sourceModelUri
          status:
            properties:
              copies:
                properties:
                  available:
                    type: integer
                  failed:
                    type: integer
                  total:
                    type: integer
              nodeStatus:
                additionalProperties:
                  enum:
                  - ""
                  - NodeNotReady
                  - NodeDownloadPending
                  - NodeDownloading
                  - NodeDownloaded
                  - NodeDownloadError

Import

kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml

I/O Contract

Inputs

Name Type Required Description
spec.modelSize integer or string (Quantity) Yes The size of the model to be cached, used for storage capacity planning
spec.nodeGroups array of strings Yes List of node groups (minimum 1) on which to cache the model
spec.sourceModelUri string Yes The URI of the model source; immutable once set

Outputs

Name Type Description
LocalModelCache CRD CustomResourceDefinition Registers the LocalModelCache resource type in the Kubernetes API server
status.copies object Tracks available, failed, and total copy counts across nodes
status.nodeStatus map of string Per-node download status (NodeDownloadPending, NodeDownloading, NodeDownloaded, NodeDownloadError)
status.inferenceServices array List of InferenceServices associated with this cached model

Usage Examples

Apply the CRD

kubectl apply -f config/crd/full/localmodel/serving.kserve.io_localmodelcaches.yaml

Create a LocalModelCache resource

apiVersion: serving.kserve.io/v1alpha1
kind: LocalModelCache
metadata:
  name: my-model-cache
spec:
  modelSize: "10Gi"
  nodeGroups:
    - gpu-workers
  sourceModelUri: "gs://my-bucket/my-model"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment