Implementation:Kserve Kserve LLM Worker Template
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, LLM Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete LLMInferenceServiceConfig template for standard (non-disaggregated) LLM inference workers provided by the KServe project.
Description
This file defines the default pod template configuration for standard LLM inference workers where prefill and decode run together in a single container. It specifies an LLMInferenceServiceConfig with a single vLLM container (llm-d-cuda:v0.4.0) serving on port 8000, with Go-template-injected model names, health and readiness probes, shared memory (1Gi tmpfs at /dev/shm), model cache volume mounts, and TLS certificates. This template is suitable for simpler deployment scenarios where disaggregated serving is not required.
Usage
This configuration is consumed by the LLMInferenceService controller as the default worker template for non-disaggregated LLM serving. Apply it to the cluster as part of the LLM serving configuration. It is used when creating LLMInferenceService resources that run prefill and decode together in a unified worker process.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/llmisvcconfig/config-llm-template.yaml
- Lines: 1-87
Signature
apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
name: kserve-config-llm-template
spec:
template:
containers:
- image: ghcr.io/llm-d/llm-d-cuda:v0.4.0
name: main
ports:
- containerPort: 8000
protocol: TCP
command:
- vllm
- serve
- /mnt/models
args:
- --served-model-name
- "{{ .Spec.Model.Name }}"
- --port
- "8000"
env:
- name: HOME
value: /home
- name: VLLM_LOGGING_LEVEL
value: INFO
- name: HF_HUB_CACHE
value: /models
volumes:
- emptyDir: {}
name: home
- emptyDir:
medium: Memory
sizeLimit: 1Gi
name: dshm
- emptyDir: {}
name: model-cache
- name: tls-certs
secret:
secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"
Import
kubectl apply -f config/llmisvcconfig/config-llm-template.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| .Spec.Model.Name | Go template variable | Yes | Model name injected dynamically at reconciliation time |
| .ObjectMeta.Name | Go template variable | Yes | Object name used to derive the TLS secret name |
Outputs
| Name | Type | Description |
|---|---|---|
| LLMInferenceServiceConfig | Custom Resource | Standard worker template configuration consumed by the LLMIsvc controller |
| vLLM worker container | Container (port 8000) | Serves the model with combined prefill and decode in a single process |
Usage Examples
Apply the worker template
kubectl apply -f config/llmisvcconfig/config-llm-template.yaml
Verify the config is present
kubectl get llminferenceserviceconfig kserve-config-llm-template