Implementation:Kserve Kserve LLMInferenceService Full CRD
| Knowledge Sources | |
|---|---|
| Domains | Kubernetes, CRD, LLM Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete CRD definition for the LLMInferenceService custom resource in the KServe serving API, providing full OpenAPI v3 schema validation.
Description
This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceService kind (short name: llmisvc), produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. Unlike the Go type specification documented in Kserve_Kserve_LLMInferenceService_CRD_Spec, this file is the generated CRD YAML with complete field-level validation, including printer columns for URL, Ready status, Reason, and Age. It defines a status subresource for controller-managed state tracking.
Usage
Apply this CRD during KServe installation to register the LLMInferenceService API with the Kubernetes API server. Once registered, users can create LLMInferenceService resources to deploy and manage LLM inference serving workloads with full schema validation enforced by the API server.
Code Reference
Source Location
- Repository: Kserve_Kserve
- File: config/crd/full/llmisvc/serving.kserve.io_llminferenceservices.yaml
- Lines: 1-40853
Signature
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.19.0
name: llminferenceservices.serving.kserve.io
spec:
group: serving.kserve.io
names:
kind: LLMInferenceService
listKind: LLMInferenceServiceList
plural: llminferenceservices
shortNames:
- llmisvc
singular: llminferenceservice
scope: Namespaced
versions:
- name: v1alpha1
additionalPrinterColumns:
- jsonPath: .status.url
name: URL
type: string
- jsonPath: .status.conditions[?(@.type=='Ready')].status
name: Ready
type: string
- jsonPath: .status.conditions[?(@.type=='Ready')].reason
name: Reason
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: date
- jsonPath: .status.addresses[*].url
name: URLs
priority: 1
type: string
subresources:
status: {}
Import
kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceservices.yaml
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| apiVersion | string | Yes | Must be serving.kserve.io/v1alpha1
|
| kind | string | Yes | Must be LLMInferenceService
|
| metadata | ObjectMeta | Yes | Standard Kubernetes object metadata |
| spec | LLMInferenceServiceSpec | Yes | LLM inference service specification including model, parallelism, prefill/decode configuration, and worker pod templates |
Key spec fields:
| Field | Type | Required | Description |
|---|---|---|---|
| spec.baseRefs | []LocalObjectReference | No | References to base LLMInferenceServiceConfig objects for configuration inheritance |
| spec.model | ModelSpec | No | Model configuration with name, URI, criticality level (Critical/Standard/Sheddable), and LoRA adapter definitions |
| spec.model.uri | string | Yes (within model) | URI pointing to the model artifacts |
| spec.parallelism | ParallelismSpec | No | Parallelism strategy: tensor, pipeline, data, dataLocal parallelism degrees, expert parallelism toggle, and RPC port |
| spec.prefill | PrefillSpec | No | Prefill-phase-specific settings with independent parallelism configuration |
| spec.workerSpec | WorkerSpec | No | Full pod template spec for the inference worker including containers, volumes, and scheduling |
Outputs
| Name | Type | Description |
|---|---|---|
| status.url | string | Primary URL endpoint for the inference service |
| status.addresses | []AddressSpec | List of addressable endpoints with URL, name, audience, and CA certificates |
| status.conditions | []Condition | Knative-style conditions including Ready status with reason and message |
| status.observedGeneration | int64 | The generation most recently observed by the controller |
| status.annotations | map[string]string | Controller-managed annotations propagated to status |
Usage Examples
Create an LLMInferenceService
apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
name: llama-service
namespace: default
spec:
model:
name: llama-3-70b
uri: gs://models/llama-3-70b
criticality: Critical
parallelism:
tensor: 8
pipeline: 1
Check Status
kubectl get llmisvc llama-service
# NAME URL READY REASON AGE
# llama-service http://llama-service.default.example.com True 5m
Related Pages
- Principle:Kserve_Kserve_LLMInferenceService_Specification
- Kserve_Kserve_LLMInferenceService_CRD_Spec -- Go type definitions for the LLMInferenceService resource
- Kserve_Kserve_LLMIsvc_Controller -- Controller that reconciles LLMInferenceService resources