Implementation:Kserve Kserve LLMInferenceServiceConfig CRD

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, CRD, LLM Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete CRD definition for the LLMInferenceServiceConfig custom resource in the KServe serving API.

Description

This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceServiceConfig kind, produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. The CRD provides comprehensive OpenAPI v3 schema validation for all spec fields, enabling cluster operators to define shared configuration templates for LLM inference serving workloads, including model configuration, parallelism settings, prefill/decode phases, and full pod template specifications.

Usage

Apply this CRD during KServe installation to register the LLMInferenceServiceConfig API with the Kubernetes API server. Once registered, namespace administrators can create LLMInferenceServiceConfig resources that serve as reusable configuration templates referenced by LLMInferenceService instances.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/crd/full/llmisvc/serving.kserve.io_llminferenceserviceconfigs.yaml
Lines: 1-40663

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: llminferenceserviceconfigs.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LLMInferenceServiceConfig
    listKind: LLMInferenceServiceConfigList
    plural: llminferenceserviceconfigs
    singular: llminferenceserviceconfig
  scope: Namespaced
  versions:
    - name: v1alpha1

Import

kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceserviceconfigs.yaml

I/O Contract

Inputs

Name	Type	Required	Description
apiVersion	string	Yes	Must be `serving.kserve.io/v1alpha1`
kind	string	Yes	Must be `LLMInferenceServiceConfig`
metadata	ObjectMeta	Yes	Standard Kubernetes object metadata
spec	LLMInferenceServiceConfigSpec	Yes	Configuration template spec containing model settings, parallelism, prefill/decode config, and pod template overrides

Key spec fields:

Field	Type	Required	Description
spec.baseRefs	[]LocalObjectReference	No	References to base configuration objects for composition
spec.model	ModelSpec	No	Model configuration including name, URI, criticality (Critical/Standard/Sheddable), and LoRA adapter settings
spec.model.uri	string	Yes (within model)	URI of the model to serve
spec.parallelism	ParallelismSpec	No	Parallelism settings including data, dataLocal, pipeline, tensor parallelism, and expert parallelism toggle
spec.prefill	PrefillSpec	No	Prefill-phase-specific configuration with its own parallelism settings
spec.workerSpec	WorkerSpec	No	Full pod template specification for worker containers and volumes

Outputs

Name	Type	Description
(none)	--	This CRD does not define a status subresource; it is a pure configuration template resource

Usage Examples

Create an LLMInferenceServiceConfig

apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
  name: llama-config
  namespace: default
spec:
  model:
    name: llama-3
    uri: gs://models/llama-3-70b
    criticality: Standard
  parallelism:
    tensor: 4
    pipeline: 2
    data: 1

Related Pages

New principle needed: Kserve_Kserve_LLMInferenceServiceConfig_Specification
Kserve_Kserve_LLMInferenceService_CRD_Spec -- Go type definitions for the related LLMInferenceService resource

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment