Environment:Kornia Kornia CUDA GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing |
| Last Updated | 2026-02-09 15:00 GMT |
Overview
CUDA GPU environment for accelerated Kornia operations including FlashAttention, mixed precision, and device-optimized compute paths.
Description
While Kornia works fully on CPU, many operations have GPU-optimized code paths that provide significant performance improvements. This environment covers NVIDIA CUDA GPU support including FlashAttention for feature matching (LightGlue, LoFTR), mixed-precision (AMP) support for deep models (DeDoDe, LightGlue), and device-specific branching that uses conv2d on GPU vs einsum on CPU for color transformations. The project supports CUDA 12.1 and 12.4 via PyTorch's custom indices.
Usage
Use this environment when running compute-intensive operations such as deep feature matching (LoFTR, LightGlue), deep edge detection (DexiNed), image stitching on large images, or any workflow where GPU acceleration is desired. The GPU environment enables FlashAttention, mixed-precision training, and faster convolution-based compute paths.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (recommended), Windows | macOS uses MPS instead of CUDA |
| Hardware | NVIDIA GPU with CUDA support | Minimum compute capability varies by PyTorch version |
| CUDA | 12.1 or 12.4 | Configured via PyTorch custom index URLs |
| Disk | ~2GB additional | For CUDA-enabled PyTorch |
Dependencies
System Packages
- NVIDIA GPU drivers (compatible with CUDA version)
- CUDA Toolkit 12.1 or 12.4 (bundled with PyTorch CUDA builds)
Python Packages
- `torch` >= 2.0.0 (CUDA build, e.g., `torch+cu121` or `torch+cu124`)
- All core Kornia dependencies (see PyTorch_Python_Environment)
- `xformers` (optional, for DeDoDe memory-efficient attention)
- `flash-attn` (optional, for LightGlue FlashAttention)
Credentials
No credentials required for GPU usage.
Quick Install
# Install PyTorch with CUDA 12.1
pip install torch --index-url https://download.pytorch.org/whl/cu121
# Install PyTorch with CUDA 12.4
pip install torch --index-url https://download.pytorch.org/whl/cu124
# Install Kornia
pip install kornia
# Optional: xformers for DeDoDe
pip install xformers
# Via pixi (project's recommended approach)
pixi run -e cuda install
Code Evidence
CUDA device detection from `kornia/core/utils.py:45-58`:
def get_cuda_device_if_available(index: int = 0) -> torch.device:
"""Try to get cuda device, if fail, return cpu."""
if torch.cuda.is_available():
return torch.device(f"cuda:{index}")
return torch.device("cpu")
FlashAttention availability check from `kornia/feature/lightglue.py:140-142`:
warnings.warn(
"FlashAttention is not available. For optimal speed, "
"consider installing torch >= 2.0 or flash-attn.",
)
CUDA-specific FlashAttention activation from `kornia/feature/lightglue.py:149-154`:
torch.backends.cuda.enable_flash_sdp(allow_flash)
# ...
if self.enable_flash and q.device.type == "cuda":
Mixed-precision autocast from `kornia/feature/lightglue.py:576`:
with torch.autocast(enabled=self.conf.mp, device_type="cuda"):
xformers optional dependency for DeDoDe from `kornia/feature/dedode/transformer/layers/attention.py:35-40`:
try:
from xformers.ops import fmha, memory_efficient_attention, unbind
XFORMERS_AVAILABLE = True
except ImportError:
XFORMERS_AVAILABLE = False
CUDA index URLs from `pyproject.toml:309-317`:
[[tool.uv.index]]
name = "pytorch-cu121"
url = "https://download.pytorch.org/whl/cu121"
explicit = true
[[tool.uv.index]]
name = "pytorch-cu124"
url = "https://download.pytorch.org/whl/cu124"
explicit = true
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDA out of memory` | Insufficient GPU VRAM | Reduce batch size or image resolution; use gradient checkpointing |
| `FlashAttention is not available` | torch < 2.0 or flash-attn not installed | Upgrade to torch >= 2.0 or install flash-attn package |
| `xFormers not available` | xformers not installed | `pip install xformers` (only needed for DeDoDe) |
| `RuntimeError: CUDA error` | CUDA driver/toolkit mismatch | Ensure GPU drivers match the CUDA version used by PyTorch |
Compatibility Notes
- MPS (Apple Silicon): Kornia has MPS-specific workarounds (e.g., `is_mps_tensor_safe()` in `kornia/core/utils.py:40-42`). Some ops like `torch.std_mean` behave differently on MPS and have explicit fallbacks.
- XLA/TPU: Experimental support via `torch_xla`. Detected by `xla_is_available()` in `kornia/core/utils.py:33-37`.
- FP16 ONNX Inference: Requires CUDA device. `KORNIA_CHECK(device.type == "cuda", "FP16 requires CUDA.")` in `kornia/feature/lightglue_onnx/lightglue.py:83`.
- CPU Fallback: All Kornia operations work on CPU. GPU code paths are conditional and gracefully fall back to CPU.