Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve RDMA Network Provisioning

From Leeroopedia
Knowledge Sources
Domains Networking, HPC, GPU_Computing
Last Updated 2026-02-13 00:00 GMT

Overview

A high-performance networking pattern that provisions RDMA-capable network interfaces for zero-copy KV cache transfer between disaggregated prefill and decode GPU pods.

Description

RDMA Network Provisioning configures SR-IOV (Single Root I/O Virtualization) virtual functions on Mellanox/NVIDIA ConnectX network adapters to enable RDMA (Remote Direct Memory Access) communication between pods. This is critical for disaggregated LLM serving where:

  • Prefill pods compute KV caches
  • KV caches must be transferred to decode pods with minimal latency
  • Standard TCP networking adds unacceptable overhead for large KV cache transfers

The SR-IOV Network Operator creates virtual network functions (VFs) from physical NICs, and Multus CNI attaches them to pods as secondary interfaces.

Usage

Required for disaggregated prefill-decode serving with NixlConnector KV transfer. Not needed for single-node or standard LLM serving.

Theoretical Basis

# RDMA networking stack (NOT implementation code)
Physical Layer:
  Mellanox ConnectX NIC → SR-IOV VFs (8 per physical port)
  Link type: Ethernet (RoCE v2)
  MTU: 9000 (jumbo frames for large KV transfers)

Kubernetes Layer:
  SriovNetworkNodePolicy → Provisions VFs on matching nodes
  SriovNetwork → Assigns IP range via Whereabouts IPAM
  Multus annotation → Attaches VF to pod as secondary interface

Application Layer:
  UCX transport: rc,sm,self,cuda_copy,cuda_ipc
  NVSHMEM: ibgda transport for GPU-direct RDMA
  NixlConnector: KV cache transfer protocol

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment