Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve Canary Traffic Splitting

From Leeroopedia
Knowledge Sources
Domains MLOps, Deployment_Strategy, Traffic_Management
Last Updated 2026-02-13 00:00 GMT

Overview

A traffic management technique that routes a configurable percentage of inference requests to a new model version while keeping the majority on the stable version.

Description

Canary Traffic Splitting allows operators to gradually shift traffic from an established model version to a new one. By setting canaryTrafficPercent on the InferenceService spec, KServe instructs the Knative reconciler to create two traffic targets:

  • Stable revision: Receives (100 - canaryTrafficPercent)% of traffic
  • Canary revision: Receives canaryTrafficPercent% of traffic

This pattern minimizes risk during model updates. If the canary shows degraded performance, only a small fraction of users are affected, and rollback is immediate.

Usage

Use this technique when deploying a new model version that needs validation in production before full rollout. Applicable to:

  • Model accuracy improvements
  • Framework version upgrades
  • Resource configuration changes
  • Any change that could affect prediction quality or latency

Theoretical Basis

# Traffic splitting model (NOT implementation code)
Given: canaryTrafficPercent = P

Traffic distribution:
  Stable (previous revision):  (100 - P)%
  Canary (latest revision):    P%

Knative Traffic Targets:
  [{tag: "prev", revisionName: "rev-00001", percent: (100-P)},
   {tag: "latest", latestRevision: true, percent: P}]

Progressive rollout: P = 10 → 20 → 50 → 100 (promote)
Rollback: set P = 0 (all traffic to stable)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment