Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Kserve Kserve Canary Rollout Deployment

From Leeroopedia
Knowledge Sources
Domains ML_Serving, Kubernetes, MLOps, Traffic_Management
Last Updated 2026-02-13 14:00 GMT

Overview

End-to-end process for safely rolling out a new model version using canary traffic splitting, validation, and promotion on KServe.

Description

This workflow covers the progressive rollout of a new model version alongside an existing stable version. KServe automatically tracks the last known good revision and splits traffic between it and the new canary revision based on a configurable percentage. The process includes deploying the initial model, updating with a canary percentage, validating the new model under partial traffic, and promoting or rolling back based on observed behavior. Tag-based routing enables explicit testing of specific revisions.

Usage

Execute this workflow when you need to update a production InferenceService with a new model version while minimizing risk. Use canary rollout when you want to gradually shift traffic from the current model to a new version, validate performance under real traffic, and retain the ability to instantly roll back if issues are detected.

Execution Steps

Step 1: Deploy the initial model version

Create and apply the InferenceService with the initial model version. This establishes the baseline revision that will serve 100% of traffic. KServe assigns this as the stable revision with full traffic allocation.

Key considerations:

  • The initial deployment must reach Ready state before proceeding
  • KServe automatically tracks this as the "last rolled out" revision
  • Verify predictions are correct before proceeding to canary updates

Step 2: Update InferenceService with canary traffic split

Modify the InferenceService spec to point to the new model version and set the canaryTrafficPercent field to the desired initial traffic percentage (e.g., 10%). Apply the updated manifest. KServe creates a new revision and splits traffic between the previous stable revision and the new canary revision.

Key considerations:

  • Set canaryTrafficPercent to a small value initially (5-20%)
  • The storageUri changes to point to the new model artifacts
  • KServe automatically manages two revisions and their traffic allocation
  • Both old and new model pods run simultaneously during canary

Step 3: Validate canary model under partial traffic

Monitor the canary revision for correctness, latency, and error rate. Send test requests and observe that traffic is split according to the configured percentage. Use tag-based routing to send requests explicitly to the canary or previous revision for focused testing.

What happens:

  • Repeated requests are distributed between revisions based on weight
  • Enable tag-based routing via annotation serving.kserve.io/enable-tag-routing
  • Latest revision accessible at latest-{service}-predictor-default.{namespace}
  • Previous revision accessible at prev-{service}-predictor-default.{namespace}

Step 4: Gradually increase canary traffic

If the canary model performs well, incrementally increase the canaryTrafficPercent value (e.g., 10% to 50% to 100%). Apply the updated manifest at each increment and monitor for regressions. This provides controlled exposure of the new model to production traffic.

Key considerations:

  • Increase traffic gradually to catch issues at larger scale
  • Monitor error rates and latency at each increment
  • Keep the rollback option available at every stage

Step 5: Promote or rollback

If the canary model is validated, promote it by removing the canaryTrafficPercent field entirely. This directs 100% traffic to the new revision, which becomes the new stable revision. The old revision pods automatically scale to zero. If issues are detected, set canaryTrafficPercent to 0 to pin all traffic to the previous stable revision, effectively rolling back.

What happens on promotion:

  • Remove canaryTrafficPercent from the spec
  • All traffic shifts to the new revision
  • Old revision pods scale down to zero automatically

What happens on rollback:

  • Set canaryTrafficPercent to 0
  • All traffic reverts to the previous stable revision
  • The new revision pods scale down

Execution Diagram

GitHub URL

Workflow Repository