Principle:Kserve Kserve PVC Model Download
| Knowledge Sources | |
|---|---|
| Domains | Storage, MLOps, LLM_Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A pre-download pattern that stages large model weights onto a persistent volume using a Kubernetes Job before serving, enabling faster startup and multi-pod sharing.
Description
PVC Model Download solves the problem of slow model loading for large LLMs (100GB+ weights). Instead of each pod downloading the model on startup, a one-time Kubernetes Job downloads the model to a ReadWriteMany PVC. Multiple serving pods can then mount the same PVC for fast local access.
The KServe storage initializer image is reused as the download agent, supporting hf:// URIs for HuggingFace Hub models.
Usage
Use this for models larger than ~10GB, especially in multi-node/multi-replica deployments where each pod downloading the model independently would waste bandwidth and slow startup.
Theoretical Basis
# PVC model download pattern (NOT implementation code)
1. Create PVC with ReadWriteMany access mode
- Size: model weights + buffer (e.g., 1Ti for 600GB model)
- StorageClass: must support RWX
2. Create Job with storage-initializer image
- Args: ["hf://org/model", "/mnt/models"]
- Mount PVC at /mnt/models
- Environment: HF_TOKEN, HF_XET_NUM_CONCURRENT_RANGE_GETS
3. Wait for Job completion
- Model weights written to PVC
4. Reference PVC in LLMInferenceService
- spec.model.uri: "pvc://pvc-name"
- Multiple replicas mount same PVC