Implementation:Kserve Kserve LLM Prefill Template
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, LLM Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete LLMInferenceServiceConfig template for LLM prefill workers provided by the KServe project.
Description
This file defines the default pod template configuration for LLM prefill workers in the disaggregated prefill-decode serving architecture. It specifies an LLMInferenceServiceConfig with a prefill section containing a vLLM-based container (llm-d-cuda:v0.4.0) serving on port 8000, with health probes, shared memory, model cache, and TLS certificate volume mounts. The prefill phase handles prompt processing and KV-cache computation, which runs separately from the decode phase for optimized throughput in large-scale LLM deployments.
Usage
This configuration is consumed by the LLMInferenceService controller as the default template for prefill worker pods. Apply it to the cluster as part of the LLM serving configuration. It is referenced when creating LLMInferenceService resources that use disaggregated prefill-decode architecture, where the prefill stage processes input prompts separately from token generation.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/llmisvcconfig/config-llm-prefill-template.yaml
- Lines: 1-88
Signature
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-prefill-template
spec:
prefill:
template:
containers:
- image: ghcr.io/llm-d/llm-d-cuda:v0.4.0
name: main
ports:
- containerPort: 8000
protocol: TCP
command:
- vllm
- serve
- /mnt/models
args:
- --served-model-name
- "{{ .Spec.Model.Name }}"
- --port
- "8000"
volumes:
- emptyDir: {}
name: home
- emptyDir:
medium: Memory
sizeLimit: 1Gi
name: dshm
- emptyDir: {}
name: model-cache
- name: tls-certs
secret:
secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"
Import
kubectl apply -f config/llmisvcconfig/config-llm-prefill-template.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| .Spec.Model.Name | Go template variable | Yes | Model name injected dynamically at reconciliation time |
| .ObjectMeta.Name | Go template variable | Yes | Object name used to derive the TLS secret name |
Outputs
| Name | Type | Description |
|---|---|---|
| LLMInferenceServiceConfig | Custom Resource | Prefill worker template configuration consumed by the LLMIsvc controller |
| vLLM prefill container | Container (port 8000) | Serves the model for prompt processing (prefill phase / KV-cache computation) |
Usage Examples
Apply the prefill template
kubectl apply -f config/llmisvcconfig/config-llm-prefill-template.yaml
Verify the config is present
kubectl get llminferenceserviceconfig kserve-config-llm-prefill-template