Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve LLMInferenceService Full CRD

From Leeroopedia
Knowledge Sources
Domains Kubernetes, CRD, LLM Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete CRD definition for the LLMInferenceService custom resource in the KServe serving API, providing full OpenAPI v3 schema validation.

Description

This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceService kind (short name: llmisvc), produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. Unlike the Go type specification documented in Kserve_Kserve_LLMInferenceService_CRD_Spec, this file is the generated CRD YAML with complete field-level validation, including printer columns for URL, Ready status, Reason, and Age. It defines a status subresource for controller-managed state tracking.

Usage

Apply this CRD during KServe installation to register the LLMInferenceService API with the Kubernetes API server. Once registered, users can create LLMInferenceService resources to deploy and manage LLM inference serving workloads with full schema validation enforced by the API server.

Code Reference

Source Location

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: llminferenceservices.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LLMInferenceService
    listKind: LLMInferenceServiceList
    plural: llminferenceservices
    shortNames:
      - llmisvc
    singular: llminferenceservice
  scope: Namespaced
  versions:
    - name: v1alpha1
      additionalPrinterColumns:
        - jsonPath: .status.url
          name: URL
          type: string
        - jsonPath: .status.conditions[?(@.type=='Ready')].status
          name: Ready
          type: string
        - jsonPath: .status.conditions[?(@.type=='Ready')].reason
          name: Reason
          type: string
        - jsonPath: .metadata.creationTimestamp
          name: Age
          type: date
        - jsonPath: .status.addresses[*].url
          name: URLs
          priority: 1
          type: string
      subresources:
        status: {}

Import

kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceservices.yaml

I/O Contract

Inputs

Name Type Required Description
apiVersion string Yes Must be serving.kserve.io/v1alpha1
kind string Yes Must be LLMInferenceService
metadata ObjectMeta Yes Standard Kubernetes object metadata
spec LLMInferenceServiceSpec Yes LLM inference service specification including model, parallelism, prefill/decode configuration, and worker pod templates

Key spec fields:

Field Type Required Description
spec.baseRefs []LocalObjectReference No References to base LLMInferenceServiceConfig objects for configuration inheritance
spec.model ModelSpec No Model configuration with name, URI, criticality level (Critical/Standard/Sheddable), and LoRA adapter definitions
spec.model.uri string Yes (within model) URI pointing to the model artifacts
spec.parallelism ParallelismSpec No Parallelism strategy: tensor, pipeline, data, dataLocal parallelism degrees, expert parallelism toggle, and RPC port
spec.prefill PrefillSpec No Prefill-phase-specific settings with independent parallelism configuration
spec.workerSpec WorkerSpec No Full pod template spec for the inference worker including containers, volumes, and scheduling

Outputs

Name Type Description
status.url string Primary URL endpoint for the inference service
status.addresses []AddressSpec List of addressable endpoints with URL, name, audience, and CA certificates
status.conditions []Condition Knative-style conditions including Ready status with reason and message
status.observedGeneration int64 The generation most recently observed by the controller
status.annotations map[string]string Controller-managed annotations propagated to status

Usage Examples

Create an LLMInferenceService

apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceService
metadata:
  name: llama-service
  namespace: default
spec:
  model:
    name: llama-3-70b
    uri: gs://models/llama-3-70b
    criticality: Critical
  parallelism:
    tensor: 8
    pipeline: 1

Check Status

kubectl get llmisvc llama-service
# NAME             URL                                      READY   REASON   AGE
# llama-service    http://llama-service.default.example.com  True             5m

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment