Implementation:Kserve Kserve LLM Prefill Template

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, LLM Serving
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete LLMInferenceServiceConfig template for LLM prefill workers provided by the KServe project.

Description

This file defines the default pod template configuration for LLM prefill workers in the disaggregated prefill-decode serving architecture. It specifies an LLMInferenceServiceConfig with a prefill section containing a vLLM-based container (llm-d-cuda:v0.4.0) serving on port 8000, with health probes, shared memory, model cache, and TLS certificate volume mounts. The prefill phase handles prompt processing and KV-cache computation, which runs separately from the decode phase for optimized throughput in large-scale LLM deployments.

Usage

This configuration is consumed by the LLMInferenceService controller as the default template for prefill worker pods. Apply it to the cluster as part of the LLM serving configuration. It is referenced when creating LLMInferenceService resources that use disaggregated prefill-decode architecture, where the prefill stage processes input prompts separately from token generation.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/llmisvcconfig/config-llm-prefill-template.yaml
Lines: 1-88

Signature

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
  name: kserve-config-llm-prefill-template
spec:
  prefill:
    template:
      containers:
        - image: ghcr.io/llm-d/llm-d-cuda:v0.4.0
          name: main
          ports:
            - containerPort: 8000
              protocol: TCP
          command:
            - vllm
            - serve
            - /mnt/models
          args:
            - --served-model-name
            - "{{ .Spec.Model.Name }}"
            - --port
            - "8000"
      volumes:
        - emptyDir: {}
          name: home
        - emptyDir:
            medium: Memory
            sizeLimit: 1Gi
          name: dshm
        - emptyDir: {}
          name: model-cache
        - name: tls-certs
          secret:
            secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"

Import

kubectl apply -f config/llmisvcconfig/config-llm-prefill-template.yaml

I/O Contract

Inputs

Name	Type	Required	Description
.Spec.Model.Name	Go template variable	Yes	Model name injected dynamically at reconciliation time
.ObjectMeta.Name	Go template variable	Yes	Object name used to derive the TLS secret name

Outputs

Name	Type	Description
LLMInferenceServiceConfig	Custom Resource	Prefill worker template configuration consumed by the LLMIsvc controller
vLLM prefill container	Container (port 8000)	Serves the model for prompt processing (prefill phase / KV-cache computation)

Usage Examples

Apply the prefill template

kubectl apply -f config/llmisvcconfig/config-llm-prefill-template.yaml

Verify the config is present

kubectl get llminferenceserviceconfig kserve-config-llm-prefill-template

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment