Principle:Kserve Kserve Canary Traffic Splitting
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Deployment_Strategy, Traffic_Management |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A traffic management technique that routes a configurable percentage of inference requests to a new model version while keeping the majority on the stable version.
Description
Canary Traffic Splitting allows operators to gradually shift traffic from an established model version to a new one. By setting canaryTrafficPercent on the InferenceService spec, KServe instructs the Knative reconciler to create two traffic targets:
- Stable revision: Receives (100 - canaryTrafficPercent)% of traffic
- Canary revision: Receives canaryTrafficPercent% of traffic
This pattern minimizes risk during model updates. If the canary shows degraded performance, only a small fraction of users are affected, and rollback is immediate.
Usage
Use this technique when deploying a new model version that needs validation in production before full rollout. Applicable to:
- Model accuracy improvements
- Framework version upgrades
- Resource configuration changes
- Any change that could affect prediction quality or latency
Theoretical Basis
# Traffic splitting model (NOT implementation code)
Given: canaryTrafficPercent = P
Traffic distribution:
Stable (previous revision): (100 - P)%
Canary (latest revision): P%
Knative Traffic Targets:
[{tag: "prev", revisionName: "rev-00001", percent: (100-P)},
{tag: "latest", latestRevision: true, percent: P}]
Progressive rollout: P = 10 → 20 → 50 → 100 (promote)
Rollback: set P = 0 (all traffic to stable)