Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve InferenceService Specification

From Leeroopedia
Knowledge Sources
Domains MLOps, Kubernetes, Model_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

A declarative specification pattern that defines an ML model serving endpoint as a Kubernetes custom resource with predictor, transformer, and explainer components.

Description

The InferenceService Specification is the core abstraction in KServe. It allows users to declare the desired state of a model serving endpoint using a Kubernetes-native custom resource definition (CRD). The specification encapsulates:

  • Predictor: The model server (TensorFlow, PyTorch, sklearn, XGBoost, HuggingFace, etc.) with its storage URI and resource requirements.
  • Transformer: An optional pre/post-processing component.
  • Explainer: An optional model explainability component.

This pattern solves the complexity of deploying ML models in production by providing a high-level declarative interface. Users specify what they want (model format, storage location, resources) and the KServe controller determines how to achieve it (selecting runtimes, configuring networking, managing scaling).

Usage

Use this principle when deploying any ML model as an inference endpoint on Kubernetes. It is the entry point for all KServe serving workflows, applicable to:

  • Traditional ML models (sklearn, XGBoost, LightGBM)
  • Deep learning models (TensorFlow, PyTorch, ONNX)
  • LLM models via HuggingFace runtime
  • Custom model servers via container spec

Theoretical Basis

The specification follows the Kubernetes declarative model:

# Abstract spec structure (NOT implementation code)
InferenceService:
  spec:
    predictor:          # REQUIRED: model server
      model:
        modelFormat     # What framework (tensorflow, sklearn, etc.)
        storageUri      # Where the model lives (s3://, gs://, hf://)
        resources       # CPU/memory/GPU requirements
    transformer:        # OPTIONAL: pre/post processing
    explainer:          # OPTIONAL: model explainability

The controller reconciles the declared spec into runtime resources:

1. User writes InferenceService YAML
2. Webhook defaults missing fields (runtime selection, resource limits)
3. Webhook validates the spec (naming, component exclusivity)
4. Controller creates Knative Services or raw Deployments
5. Ingress reconciler creates VirtualService/HTTPRoute
6. Status is updated with endpoint URL

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment