Implementation:Kserve Kserve LLMInferenceService Full CRD

Knowledge Sources	Kserve_Kserve KServe Docs
Domains	Kubernetes, CRD, LLM Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete CRD definition for the LLMInferenceService custom resource in the KServe serving API, providing full OpenAPI v3 schema validation.

Description

This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceService kind (short name: llmisvc), produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. Unlike the Go type specification documented in Kserve_Kserve_LLMInferenceService_CRD_Spec, this file is the generated CRD YAML with complete field-level validation, including printer columns for URL, Ready status, Reason, and Age. It defines a status subresource for controller-managed state tracking.

Usage

Apply this CRD during KServe installation to register the LLMInferenceService API with the Kubernetes API server. Once registered, users can create LLMInferenceService resources to deploy and manage LLM inference serving workloads with full schema validation enforced by the API server.

Code Reference

Source Location

Repository: Kserve_Kserve
File: config/crd/full/llmisvc/serving.kserve.io_llminferenceservices.yaml
Lines: 1-40853

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: llminferenceservices.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LLMInferenceService
    listKind: LLMInferenceServiceList
    plural: llminferenceservices
    shortNames:
      - llmisvc
    singular: llminferenceservice
  scope: Namespaced
  versions:
    - name: v1alpha1
      additionalPrinterColumns:
        - jsonPath: .status.url
          name: URL
          type: string
        - jsonPath: .status.conditions[?(@.type=='Ready')].status
          name: Ready
          type: string
        - jsonPath: .status.conditions[?(@.type=='Ready')].reason
          name: Reason
          type: string
        - jsonPath: .metadata.creationTimestamp
          name: Age
          type: date
        - jsonPath: .status.addresses[*].url
          name: URLs
          priority: 1
          type: string
      subresources:
        status: {}

Import

kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceservices.yaml

I/O Contract

Inputs

Name	Type	Required	Description
apiVersion	string	Yes	Must be `serving.kserve.io/v1alpha1`
kind	string	Yes	Must be `LLMInferenceService`
metadata	ObjectMeta	Yes	Standard Kubernetes object metadata
spec	LLMInferenceServiceSpec	Yes	LLM inference service specification including model, parallelism, prefill/decode configuration, and worker pod templates

Key spec fields:

Field	Type	Required	Description
spec.baseRefs	[]LocalObjectReference	No	References to base LLMInferenceServiceConfig objects for configuration inheritance
spec.model	ModelSpec	No	Model configuration with name, URI, criticality level (Critical/Standard/Sheddable), and LoRA adapter definitions
spec.model.uri	string	Yes (within model)	URI pointing to the model artifacts
spec.parallelism	ParallelismSpec	No	Parallelism strategy: tensor, pipeline, data, dataLocal parallelism degrees, expert parallelism toggle, and RPC port
spec.prefill	PrefillSpec	No	Prefill-phase-specific settings with independent parallelism configuration
spec.workerSpec	WorkerSpec	No	Full pod template spec for the inference worker including containers, volumes, and scheduling

Outputs

Name	Type	Description
status.url	string	Primary URL endpoint for the inference service
status.addresses	[]AddressSpec	List of addressable endpoints with URL, name, audience, and CA certificates
status.conditions	[]Condition	Knative-style conditions including Ready status with reason and message
status.observedGeneration	int64	The generation most recently observed by the controller
status.annotations	map[string]string	Controller-managed annotations propagated to status

Usage Examples

Create an LLMInferenceService

apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
  name: llama-service
  namespace: default
spec:
  model:
    name: llama-3-70b
    uri: gs://models/llama-3-70b
    criticality: Critical
  parallelism:
    tensor: 8
    pipeline: 1

Check Status

kubectl get llmisvc llama-service
# NAME             URL                                      READY   REASON   AGE
# llama-service    http://llama-service.default.example.com  True             5m

Related Pages

Principle:Kserve_Kserve_LLMInferenceService_Specification
Kserve_Kserve_LLMInferenceService_CRD_Spec -- Go type definitions for the LLMInferenceService resource
Kserve_Kserve_LLMIsvc_Controller -- Controller that reconciles LLMInferenceService resources

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment