Heuristic:Kserve Kserve NCCL RoCE Auto Detection
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, High_Performance_Networking |
| Last Updated | 2026-02-13 14:00 GMT |
Overview
KServe auto-detects NCCL InfiniBand HCA configuration and RoCE GID index from sysfs, preferring GID_INDEX=3 for SR-IOV environments.
Description
Multi-node distributed inference with NCCL requires correct InfiniBand HCA (Host Channel Adapter) and GID index configuration. KServe's worker data-parallel pod templates include init scripts that scan `/sys/class/infiniband/mlx5_*` to auto-detect active RoCE v2 ports and derive the optimal NCCL, NVSHMEM, and UCX environment variables.
Usage
Use this heuristic when deploying multi-node distributed inference with RDMA networking. The auto-detection handles most environments, but manual overrides may be needed for non-standard configurations.
The Insight (Rule of Thumb)
- Action: Let the init script auto-detect NCCL parameters from sysfs. Override only if auto-detection fails.
- Value: For SR-IOV environments, GID_INDEX=3 is the deterministic fallback when multiple candidates have equal frequency.
- Trade-off: Auto-detection adds startup time but ensures correct configuration across heterogeneous hardware. Manual overrides bypass detection but risk misconfiguration on different node types.
- Key variables auto-detected:
- `NCCL_IB_HCA` - Active InfiniBand HCAs (e.g., `mlx5_0,mlx5_1`)
- `NCCL_IB_GID_INDEX` - RoCE v2 GID index (typically 3 for SR-IOV)
- `NVSHMEM_IB_GID_INDEX` - Mirrors NCCL GID index
- `UCX_NET_DEVICES` - UCX network devices matching active HCAs
Reasoning
Different GPU nodes may have different InfiniBand adapter configurations. The auto-detection script:
- Scans `/sys/class/infiniband/mlx5_*` for Mellanox HCAs
- Checks port state (`ACTIVE`) and RoCE v2 support via `gid_attrs/ndev` and `types`
- Counts GID index occurrences across active ports
- Selects the most common GID index, with deterministic fallback to 3
# From config/llmisvcconfig/config-llm-worker-data-parallel.yaml
KSERVE_INFER_IB_GID_INDEX_GREP=${KSERVE_INFER_IB_GID_INDEX_GREP:-"RoCE v2"}
for hca_dir in /sys/class/infiniband/mlx5_*; do
port_state_file="$hca_dir/ports/1/state"
if grep -q "ACTIVE" "$port_state_file"; then
# Check RoCE v2 support and collect GID indices
fi
done
# SR-IOV preference: GID_INDEX=3
if [ "${gid_index_count['3']}" -eq "$max_count" ]; then
best_gid_index="3"
fi