Principle:Kserve Kserve Canary Traffic Splitting

Knowledge Sources	Canary Deployment Knative Traffic Management
Domains	MLOps, Deployment_Strategy, Traffic_Management
Last Updated	2026-02-13 00:00 GMT

Overview

A traffic management technique that routes a configurable percentage of inference requests to a new model version while keeping the majority on the stable version.

Description

Canary Traffic Splitting allows operators to gradually shift traffic from an established model version to a new one. By setting canaryTrafficPercent on the InferenceService spec, KServe instructs the Knative reconciler to create two traffic targets:

Stable revision: Receives (100 - canaryTrafficPercent)% of traffic
Canary revision: Receives canaryTrafficPercent% of traffic

This pattern minimizes risk during model updates. If the canary shows degraded performance, only a small fraction of users are affected, and rollback is immediate.

Usage

Use this technique when deploying a new model version that needs validation in production before full rollout. Applicable to:

Model accuracy improvements
Framework version upgrades
Resource configuration changes
Any change that could affect prediction quality or latency

Theoretical Basis

# Traffic splitting model (NOT implementation code)
Given: canaryTrafficPercent = P

Traffic distribution:
  Stable (previous revision):  (100 - P)%
  Canary (latest revision):    P%

Knative Traffic Targets:
  [{tag: "prev", revisionName: "rev-00001", percent: (100-P)},
   {tag: "latest", latestRevision: true, percent: P}]

Progressive rollout: P = 10 → 20 → 50 → 100 (promote)
Rollback: set P = 0 (all traffic to stable)

Related Pages

Implemented By

Implementation:Kserve_Kserve_CanaryTrafficPercent_Spec

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment