Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve Initial InferenceService Deployment

From Leeroopedia
Revision as of 13:09, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Kserve_Kserve_Initial_InferenceService_Deployment.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains MLOps, Deployment_Strategy, Model_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete YAML pattern for deploying the initial baseline InferenceService that serves as the stable reference for canary rollouts.

Description

This pattern creates an InferenceService with a single predictor and no canaryTrafficPercent field. The absence of canary configuration means Knative routes 100% of traffic to the single active revision. This establishes the baseline that subsequent canary updates will be compared against.

Usage

Use this pattern as the first step in a canary rollout workflow. Apply this YAML, verify the service is ready and serving correct predictions, then proceed with canary updates.

Code Reference

Source Location

  • Repository: kserve
  • File: docs/samples/v1beta1/rollout/default.yaml, Lines 1-8

Signature

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: my-model
spec:
  predictor:
    tensorflow:
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"

Import

kubectl apply -f default.yaml

I/O Contract

Inputs

Name Type Required Description
metadata.name string Yes InferenceService name
spec.predictor.tensorflow.storageUri string Yes Model artifact URI

Outputs

Name Type Description
Revision Knative Revision Single revision <name>-predictor-default-00001
Traffic 100% All traffic to the single revision
status.url URL Prediction endpoint ready for requests

Usage Examples

Deploy Baseline Model

# 1. Deploy initial model
kubectl apply -f - <<EOF
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: my-model
spec:
  predictor:
    tensorflow:
      storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
EOF

# 2. Wait for readiness
kubectl wait inferenceservice my-model --for=condition=Ready --timeout=120s

# 3. Verify single revision
kubectl get revisions -l serving.knative.dev/service=my-model-predictor-default
# NAME                                   READY
# my-model-predictor-default-00001       True

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment