Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Kserve Kserve PVC Model Download

From Leeroopedia
Knowledge Sources
Domains Storage, MLOps, LLM_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

A pre-download pattern that stages large model weights onto a persistent volume using a Kubernetes Job before serving, enabling faster startup and multi-pod sharing.

Description

PVC Model Download solves the problem of slow model loading for large LLMs (100GB+ weights). Instead of each pod downloading the model on startup, a one-time Kubernetes Job downloads the model to a ReadWriteMany PVC. Multiple serving pods can then mount the same PVC for fast local access.

The KServe storage initializer image is reused as the download agent, supporting hf:// URIs for HuggingFace Hub models.

Usage

Use this for models larger than ~10GB, especially in multi-node/multi-replica deployments where each pod downloading the model independently would waste bandwidth and slow startup.

Theoretical Basis

# PVC model download pattern (NOT implementation code)
1. Create PVC with ReadWriteMany access mode
   - Size: model weights + buffer (e.g., 1Ti for 600GB model)
   - StorageClass: must support RWX

2. Create Job with storage-initializer image
   - Args: ["hf://org/model", "/mnt/models"]
   - Mount PVC at /mnt/models
   - Environment: HF_TOKEN, HF_XET_NUM_CONCURRENT_RANGE_GETS

3. Wait for Job completion
   - Model weights written to PVC

4. Reference PVC in LLMInferenceService
   - spec.model.uri: "pvc://pvc-name"
   - Multiple replicas mount same PVC

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment