Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve LLMInferenceServiceConfig CRD

From Leeroopedia
Knowledge Sources
Domains Kubernetes, CRD, LLM Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete CRD definition for the LLMInferenceServiceConfig custom resource in the KServe serving API.

Description

This file contains the auto-generated full CustomResourceDefinition for the LLMInferenceServiceConfig kind, produced by controller-gen v0.19.0. It belongs to the serving.kserve.io API group at version v1alpha1 and is a namespaced resource. The CRD provides comprehensive OpenAPI v3 schema validation for all spec fields, enabling cluster operators to define shared configuration templates for LLM inference serving workloads, including model configuration, parallelism settings, prefill/decode phases, and full pod template specifications.

Usage

Apply this CRD during KServe installation to register the LLMInferenceServiceConfig API with the Kubernetes API server. Once registered, namespace administrators can create LLMInferenceServiceConfig resources that serve as reusable configuration templates referenced by LLMInferenceService instances.

Code Reference

Source Location

Signature

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.19.0
  name: llminferenceserviceconfigs.serving.kserve.io
spec:
  group: serving.kserve.io
  names:
    kind: LLMInferenceServiceConfig
    listKind: LLMInferenceServiceConfigList
    plural: llminferenceserviceconfigs
    singular: llminferenceserviceconfig
  scope: Namespaced
  versions:
    - name: v1alpha1

Import

kubectl apply -f config/crd/full/llmisvc/serving.kserve.io_llminferenceserviceconfigs.yaml

I/O Contract

Inputs

Name Type Required Description
apiVersion string Yes Must be serving.kserve.io/v1alpha1
kind string Yes Must be LLMInferenceServiceConfig
metadata ObjectMeta Yes Standard Kubernetes object metadata
spec LLMInferenceServiceConfigSpec Yes Configuration template spec containing model settings, parallelism, prefill/decode config, and pod template overrides

Key spec fields:

Field Type Required Description
spec.baseRefs []LocalObjectReference No References to base configuration objects for composition
spec.model ModelSpec No Model configuration including name, URI, criticality (Critical/Standard/Sheddable), and LoRA adapter settings
spec.model.uri string Yes (within model) URI of the model to serve
spec.parallelism ParallelismSpec No Parallelism settings including data, dataLocal, pipeline, tensor parallelism, and expert parallelism toggle
spec.prefill PrefillSpec No Prefill-phase-specific configuration with its own parallelism settings
spec.workerSpec WorkerSpec No Full pod template specification for worker containers and volumes

Outputs

Name Type Description
(none) -- This CRD does not define a status subresource; it is a pure configuration template resource

Usage Examples

Create an LLMInferenceServiceConfig

apiVersion: serving.kserve.io/v1alpha1
kind: LLMInferenceServiceConfig
metadata:
  name: llama-config
  namespace: default
spec:
  model:
    name: llama-3
    uri: gs://models/llama-3-70b
    criticality: Standard
  parallelism:
    tensor: 4
    pipeline: 2
    data: 1

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment