Principle:Kserve Kserve RDMA Network Provisioning
| Knowledge Sources | |
|---|---|
| Domains | Networking, HPC, GPU_Computing |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A high-performance networking pattern that provisions RDMA-capable network interfaces for zero-copy KV cache transfer between disaggregated prefill and decode GPU pods.
Description
RDMA Network Provisioning configures SR-IOV (Single Root I/O Virtualization) virtual functions on Mellanox/NVIDIA ConnectX network adapters to enable RDMA (Remote Direct Memory Access) communication between pods. This is critical for disaggregated LLM serving where:
- Prefill pods compute KV caches
- KV caches must be transferred to decode pods with minimal latency
- Standard TCP networking adds unacceptable overhead for large KV cache transfers
The SR-IOV Network Operator creates virtual network functions (VFs) from physical NICs, and Multus CNI attaches them to pods as secondary interfaces.
Usage
Required for disaggregated prefill-decode serving with NixlConnector KV transfer. Not needed for single-node or standard LLM serving.
Theoretical Basis
# RDMA networking stack (NOT implementation code)
Physical Layer:
Mellanox ConnectX NIC → SR-IOV VFs (8 per physical port)
Link type: Ethernet (RoCE v2)
MTU: 9000 (jumbo frames for large KV transfers)
Kubernetes Layer:
SriovNetworkNodePolicy → Provisions VFs on matching nodes
SriovNetwork → Assigns IP range via Whereabouts IPAM
Multus annotation → Attaches VF to pod as secondary interface
Application Layer:
UCX transport: rc,sm,self,cuda_copy,cuda_ipc
NVSHMEM: ibgda transport for GPU-direct RDMA
NixlConnector: KV cache transfer protocol