Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kserve Kserve RDMA Network Configuration

From Leeroopedia
Knowledge Sources
Domains Networking, HPC, GPU_Computing
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete YAML pattern for provisioning SR-IOV RDMA network interfaces for disaggregated LLM serving with KV cache transfer.

Description

The network-roce.yaml manifest creates SriovNetworkNodePolicy and SriovNetwork resources for two physical NIC ports (p2, p13) on Mellanox/NVIDIA ConnectX adapters. Each policy configures 8 SR-IOV virtual functions with RDMA enabled, jumbo frames (MTU 9000), and Whereabouts IPAM for IP assignment.

Usage

Apply this manifest on clusters with Mellanox ConnectX NICs and the SR-IOV Network Operator installed. Pods reference the network attachments via Multus annotations.

Code Reference

Source Location

  • Repository: kserve
  • File: docs/samples/llmisvc/dp-ep/deepseek-r1-gpu-rdma-roce/network-roce.yaml, Lines 1-92

Signature

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: p2
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  linkType: eth
  mtu: 9000
  nicSelector:
    vendor: "15b3"  # Mellanox/NVIDIA
    pfNames: ["ens6f0np0#0-7"]
  nodeSelector:
    feature.node.kubernetes.io/rdma.available: "true"
    feature.node.kubernetes.io/pci-15b3.sriov.capable: "true"
  numVfs: 8
  resourceName: p2rdma
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: roce-p2
  namespace: openshift-sriov-network-operator
spec:
  ipam: '{"type": "whereabouts", "range": "10.0.132.0/24"}'
  networkNamespace: default
  resourceName: p2rdma
  spoofChk: "off"
  trust: "on"

Import

kubectl apply -f network-roce.yaml

I/O Contract

Inputs

Name Type Required Description
Mellanox NICs Hardware Yes ConnectX adapters with SR-IOV capability
SR-IOV Operator Operator Yes Manages SR-IOV network resources
Node Feature Discovery DaemonSet Yes Labels nodes with rdma.available

Outputs

Name Type Description
rdma/roce_gdr resource Schedulable RDMA resource on nodes
roce-p2 NetworkAttachmentDefinition Network attachment for pod Multus annotation
roce-p13 NetworkAttachmentDefinition Second NIC port network attachment

Usage Examples

Pod RDMA Annotation

# In LLMInferenceService pod template:
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: roce-p2
spec:
  containers:
    - resources:
        limits:
          rdma/roce_gdr: 1
          nvidia.com/gpu: "8"

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment