Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve LLM Worker Template

From Leeroopedia
Knowledge Sources
Domains Kubernetes, LLM Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete LLMInferenceServiceConfig template for standard (non-disaggregated) LLM inference workers provided by the KServe project.

Description

This file defines the default pod template configuration for standard LLM inference workers where prefill and decode run together in a single container. It specifies an LLMInferenceServiceConfig with a single vLLM container (llm-d-cuda:v0.4.0) serving on port 8000, with Go-template-injected model names, health and readiness probes, shared memory (1Gi tmpfs at /dev/shm), model cache volume mounts, and TLS certificates. This template is suitable for simpler deployment scenarios where disaggregated serving is not required.

Usage

This configuration is consumed by the LLMInferenceService controller as the default worker template for non-disaggregated LLM serving. Apply it to the cluster as part of the LLM serving configuration. It is used when creating LLMInferenceService resources that run prefill and decode together in a unified worker process.

Code Reference

Source Location

Signature

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
  name: kserve-config-llm-template
spec:
  template:
    containers:
      - image: ghcr.io/llm-d/llm-d-cuda:v0.4.0
        name: main
        ports:
          - containerPort: 8000
            protocol: TCP
        command:
          - vllm
          - serve
          - /mnt/models
        args:
          - --served-model-name
          - "{{ .Spec.Model.Name }}"
          - --port
          - "8000"
        env:
          - name: HOME
            value: /home
          - name: VLLM_LOGGING_LEVEL
            value: INFO
          - name: HF_HUB_CACHE
            value: /models
    volumes:
      - emptyDir: {}
        name: home
      - emptyDir:
          medium: Memory
          sizeLimit: 1Gi
        name: dshm
      - emptyDir: {}
        name: model-cache
      - name: tls-certs
        secret:
          secretName: "{{ ChildName .ObjectMeta.Name `-kserve-self-signed-certs` }}"

Import

kubectl apply -f config/llmisvcconfig/config-llm-template.yaml

I/O Contract

Inputs

Name Type Required Description
.Spec.Model.Name Go template variable Yes Model name injected dynamically at reconciliation time
.ObjectMeta.Name Go template variable Yes Object name used to derive the TLS secret name

Outputs

Name Type Description
LLMInferenceServiceConfig Custom Resource Standard worker template configuration consumed by the LLMIsvc controller
vLLM worker container Container (port 8000) Serves the model with combined prefill and decode in a single process

Usage Examples

Apply the worker template

kubectl apply -f config/llmisvcconfig/config-llm-template.yaml

Verify the config is present

kubectl get llminferenceserviceconfig kserve-config-llm-template

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment