Implementation:Kserve Kserve LLMInferenceServiceConfig CRD
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, CRD, LLM Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete CRD definition for the LLMInferenceServiceConfig custom resource in the KServe serving API.
Description
This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceServiceConfig kind, produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. The CRD provides comprehensive OpenAPI v3 schema validation for all spec fields, enabling cluster operators to define shared configuration templates for LLM inference serving workloads, including model configuration, parallelism settings, prefill/decode phases, and full pod template specifications.
Usage
Apply this CRD during KServe installation to register the LLMInferenceServiceConfig API with the Kubernetes API server. Once registered, namespace administrators can create LLMInferenceServiceConfig resources that serve as reusable configuration templates referenced by LLMInferenceService instances.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/crd/full/llmisvc/serving.kserve.io_llminferenceserviceconfigs.yaml
- Lines: 1-40663
Signature
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: llminferenceserviceconfigs.serving.kserve.io
spec:
group: serving.kserve.io
names:
kind: LLMInferenceServiceConfig
listKind: LLMInferenceServiceConfigList
plural: llminferenceserviceconfigs
singular: llminferenceserviceconfig
scope: Namespaced
versions:
- name: v1alpha1
Import
kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceserviceconfigs.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| apiVersion | string | Yes | Must be serving.kserve.io/v1alpha1
|
| kind | string | Yes | Must be LLMInferenceServiceConfig
|
| metadata | ObjectMeta | Yes | Standard Kubernetes object metadata |
| spec | LLMInferenceServiceConfigSpec | Yes | Configuration template spec containing model settings, parallelism, prefill/decode config, and pod template overrides |
Key spec fields:
| Field | Type | Required | Description |
|---|---|---|---|
| spec.baseRefs | []LocalObjectReference | No | References to base configuration objects for composition |
| spec.model | ModelSpec | No | Model configuration including name, URI, criticality (Critical/Standard/Sheddable), and LoRA adapter settings |
| spec.model.uri | string | Yes (within model) | URI of the model to serve |
| spec.parallelism | ParallelismSpec | No | Parallelism settings including data, dataLocal, pipeline, tensor parallelism, and expert parallelism toggle |
| spec.prefill | PrefillSpec | No | Prefill-phase-specific configuration with its own parallelism settings |
| spec.workerSpec | WorkerSpec | No | Full pod template specification for worker containers and volumes |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | -- | This CRD does not define a status subresource; it is a pure configuration template resource |
Usage Examples
Create an LLMInferenceServiceConfig
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
name: llama-config
namespace: default
spec:
model:
name: llama-3
uri: gs://models/llama-3-70b
criticality: Standard
parallelism:
tensor: 4
pipeline: 2
data: 1
Related Pages
- New principle needed:
Kserve_Kserve_LLMInferenceServiceConfig_Specification - Kserve_Kserve_LLMInferenceService_CRD_Spec -- Go type definitions for the related LLMInferenceService resource