Principle:Kserve Kserve InferenceService Specification
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Kubernetes, Model_Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A declarative specification pattern that defines an ML model serving endpoint as a Kubernetes custom resource with predictor, transformer, and explainer components.
Description
The InferenceService Specification is the core abstraction in KServe. It allows users to declare the desired state of a model serving endpoint using a Kubernetes-native custom resource definition (CRD). The specification encapsulates:
- Predictor: The model server (TensorFlow, PyTorch, sklearn, XGBoost, HuggingFace, etc.) with its storage URI and resource requirements.
- Transformer: An optional pre/post-processing component.
- Explainer: An optional model explainability component.
This pattern solves the complexity of deploying ML models in production by providing a high-level declarative interface. Users specify what they want (model format, storage location, resources) and the KServe controller determines how to achieve it (selecting runtimes, configuring networking, managing scaling).
Usage
Use this principle when deploying any ML model as an inference endpoint on Kubernetes. It is the entry point for all KServe serving workflows, applicable to:
- Traditional ML models (sklearn, XGBoost, LightGBM)
- Deep learning models (TensorFlow, PyTorch, ONNX)
- LLM models via HuggingFace runtime
- Custom model servers via container spec
Theoretical Basis
The specification follows the Kubernetes declarative model:
# Abstract spec structure (NOT implementation code)
InferenceService:
spec:
predictor: # REQUIRED: model server
model:
modelFormat # What framework (tensorflow, sklearn, etc.)
storageUri # Where the model lives (s3://, gs://, hf://)
resources # CPU/memory/GPU requirements
transformer: # OPTIONAL: pre/post processing
explainer: # OPTIONAL: model explainability
The controller reconciles the declared spec into runtime resources:
1. User writes InferenceService YAML
2. Webhook defaults missing fields (runtime selection, resource limits)
3. Webhook validates the spec (naming, component exclusivity)
4. Controller creates Knative Services or raw Deployments
5. Ingress reconciler creates VirtualService/HTTPRoute
6. Status is updated with endpoint URL