Principle:Kserve Kserve RDMA Network Provisioning

Knowledge Sources	SR-IOV Network Operator RDMA over Converged Ethernet
Domains	Networking, HPC, GPU_Computing
Last Updated	2026-02-13 00:00 GMT

Overview

A high-performance networking pattern that provisions RDMA-capable network interfaces for zero-copy KV cache transfer between disaggregated prefill and decode GPU pods.

Description

RDMA Network Provisioning configures SR-IOV (Single Root I/O Virtualization) virtual functions on Mellanox/NVIDIA ConnectX network adapters to enable RDMA (Remote Direct Memory Access) communication between pods. This is critical for disaggregated LLM serving where:

Prefill pods compute KV caches
KV caches must be transferred to decode pods with minimal latency
Standard TCP networking adds unacceptable overhead for large KV cache transfers

The SR-IOV Network Operator creates virtual network functions (VFs) from physical NICs, and Multus CNI attaches them to pods as secondary interfaces.

Usage

Required for disaggregated prefill-decode serving with NixlConnector KV transfer. Not needed for single-node or standard LLM serving.

Theoretical Basis

# RDMA networking stack (NOT implementation code)
Physical Layer:
  Mellanox ConnectX NIC → SR-IOV VFs (8 per physical port)
  Link type: Ethernet (RoCE v2)
  MTU: 9000 (jumbo frames for large KV transfers)

Kubernetes Layer:
  SriovNetworkNodePolicy → Provisions VFs on matching nodes
  SriovNetwork → Assigns IP range via Whereabouts IPAM
  Multus annotation → Attaches VF to pod as secondary interface

Application Layer:
  UCX transport: rc,sm,self,cuda_copy,cuda_ipc
  NVSHMEM: ibgda transport for GPU-direct RDMA
  NixlConnector: KV cache transfer protocol

Related Pages

Implemented By

Implementation:Kserve_Kserve_RDMA_Network_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment