Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Kserve Kserve NCCL RoCE Auto Detection

From Leeroopedia
Knowledge Sources
Domains Distributed_Computing, High_Performance_Networking
Last Updated 2026-02-13 14:00 GMT

Overview

KServe auto-detects NCCL InfiniBand HCA configuration and RoCE GID index from sysfs, preferring GID_INDEX=3 for SR-IOV environments.

Description

Multi-node distributed inference with NCCL requires correct InfiniBand HCA (Host Channel Adapter) and GID index configuration. KServe's worker data-parallel pod templates include init scripts that scan `/sys/class/infiniband/mlx5_*` to auto-detect active RoCE v2 ports and derive the optimal NCCL, NVSHMEM, and UCX environment variables.

Usage

Use this heuristic when deploying multi-node distributed inference with RDMA networking. The auto-detection handles most environments, but manual overrides may be needed for non-standard configurations.

The Insight (Rule of Thumb)

  • Action: Let the init script auto-detect NCCL parameters from sysfs. Override only if auto-detection fails.
  • Value: For SR-IOV environments, GID_INDEX=3 is the deterministic fallback when multiple candidates have equal frequency.
  • Trade-off: Auto-detection adds startup time but ensures correct configuration across heterogeneous hardware. Manual overrides bypass detection but risk misconfiguration on different node types.
  • Key variables auto-detected:
    • `NCCL_IB_HCA` - Active InfiniBand HCAs (e.g., `mlx5_0,mlx5_1`)
    • `NCCL_IB_GID_INDEX` - RoCE v2 GID index (typically 3 for SR-IOV)
    • `NVSHMEM_IB_GID_INDEX` - Mirrors NCCL GID index
    • `UCX_NET_DEVICES` - UCX network devices matching active HCAs

Reasoning

Different GPU nodes may have different InfiniBand adapter configurations. The auto-detection script:

  1. Scans `/sys/class/infiniband/mlx5_*` for Mellanox HCAs
  2. Checks port state (`ACTIVE`) and RoCE v2 support via `gid_attrs/ndev` and `types`
  3. Counts GID index occurrences across active ports
  4. Selects the most common GID index, with deterministic fallback to 3
# From config/llmisvcconfig/config-llm-worker-data-parallel.yaml
KSERVE_INFER_IB_GID_INDEX_GREP=${KSERVE_INFER_IB_GID_INDEX_GREP:-"RoCE v2"}

for hca_dir in /sys/class/infiniband/mlx5_*; do
    port_state_file="$hca_dir/ports/1/state"
    if grep -q "ACTIVE" "$port_state_file"; then
        # Check RoCE v2 support and collect GID indices
    fi
done

# SR-IOV preference: GID_INDEX=3
if [ "${gid_index_count['3']}" -eq "$max_count" ]; then
    best_gid_index="3"
fi

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment