Implementation:Turboderp org Exllamav2 Compat
| Knowledge Sources | |
|---|---|
| Domains | GPU_Management, Compatibility |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Cross-GPU tensor compatibility module that provides safe tensor movement between CUDA devices, working around driver-level peer-to-peer copy failures by testing P2P capability and falling back through CPU when necessary.
Description
compat.py contains two primary functions for reliable multi-GPU tensor transfers:
- test_gpu_peer_copy(device_a, device_b) -- Tests whether direct GPU-to-GPU peer copy works between two CUDA devices. It creates a test tensor with known values on device_a, copies it to device_b and back, then verifies the round-trip. Results are cached in a global matrix (tested_peer_copy) using tri-state values: 0 (untested), 1 (P2P works), -1 (P2P broken). The device indices are always ordered so that idx_a <= idx_b to avoid duplicate tests.
- safe_move_tensor(tensor, device, non_blocking=False) -- Moves a tensor (or tuple of tensors) to the target device using the most efficient available path. The movement strategy follows this priority:
- No-op -- If the tensor is already on the target device, return immediately.
- CPU transfers -- Copies to/from system RAM always use tensor.to() directly, optionally using CUDA streams for asynchronous transfer with explicit synchronisation.
- P2P GPU transfer -- If test_gpu_peer_copy confirms direct copy works, uses tensor.to() with CUDA streams on both source and destination devices.
- CPU fallback -- If P2P copy is broken, the tensor is first moved to CPU with synchronisation, then from CPU to the target GPU.
The module also provides a pairwise() polyfill for Python versions below 3.10, emulating itertools.pairwise using itertools.tee.
Usage
Use safe_move_tensor whenever moving tensors between CUDA devices in a multi-GPU setup. It is called internally throughout ExLlamaV2 during model loading, autosplit distribution, and inference to guarantee correct data transfer regardless of the GPU interconnect topology.
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: exllamav2/compat.py
- Lines: L1-141
Signature
def test_gpu_peer_copy(
device_a: torch.Device,
device_b: torch.Device
) -> bool:
...
def safe_move_tensor(
tensor: torch.Tensor | tuple[torch.Tensor],
device: torch.Device | str | int,
non_blocking: bool = False
) -> torch.Tensor | tuple[torch.Tensor]:
...
Import
from exllamav2.compat import safe_move_tensor, test_gpu_peer_copy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| device_a | torch.Device | Yes (test_gpu_peer_copy) | First CUDA device to test for peer-to-peer copy |
| device_b | torch.Device | Yes (test_gpu_peer_copy) | Second CUDA device to test for peer-to-peer copy |
| tensor | torch.Tensor or tuple[torch.Tensor] | Yes (safe_move_tensor) | The tensor(s) to move to the target device |
| device | torch.Device, str, or int | Yes (safe_move_tensor) | Target device specification (e.g., "cuda:1", torch.device("cuda:0"), or integer index) |
| non_blocking | bool | No (default False) | If True, allows asynchronous transfers without explicit synchronisation on the P2P path |
Outputs
| Name | Type | Description |
|---|---|---|
| peer_copy_ok | bool | From test_gpu_peer_copy: True if direct GPU-to-GPU copy succeeded, False otherwise |
| moved_tensor | torch.Tensor or tuple[torch.Tensor] | From safe_move_tensor: the tensor(s) on the target device with identical data |
Usage Examples
Safe Multi-GPU Tensor Transfer
from exllamav2.compat import safe_move_tensor
import torch
# Move a tensor from GPU 0 to GPU 1 safely
tensor_gpu0 = torch.randn(1024, 4096, device="cuda:0")
tensor_gpu1 = safe_move_tensor(tensor_gpu0, "cuda:1")
# Move a tuple of tensors
weight, bias = torch.randn(4096, device="cuda:0"), torch.randn(4096, device="cuda:0")
weight_gpu1, bias_gpu1 = safe_move_tensor((weight, bias), "cuda:1")
Testing Peer-to-Peer Copy Capability
from exllamav2.compat import test_gpu_peer_copy
import torch
device_0 = torch.device("cuda:0")
device_1 = torch.device("cuda:1")
if test_gpu_peer_copy(device_0, device_1):
print("Direct P2P copy between GPU 0 and GPU 1 is supported")
else:
print("P2P copy failed; transfers will route through CPU")