Environment:Roboflow Rf detr Python GPU Environment

Knowledge Sources	RF-DETR RF-DETR Install Guide
Domains	Infrastructure, Computer_Vision, Deep_Learning
Last Updated	2026-02-08 15:00 GMT

Overview

Python 3.10+ environment with PyTorch (1.13–2.8), CUDA/MPS/CPU device support, DINOv2 backbone via HuggingFace Transformers, and LoRA via PEFT for object detection training and inference.

Description

This environment provides the core runtime for RF-DETR, a real-time object detection model based on DETR with a DINOv2 backbone. The system auto-detects available hardware at import time (CUDA GPU, Apple MPS, or CPU fallback) and sets the default device accordingly. Training requires PyTorch with mixed-precision support (bfloat16 via `torch.amp`), the HuggingFace Transformers library for the DINOv2 windowed attention backbone, and PEFT for optional LoRA fine-tuning of the encoder.

Usage

Use this environment for all RF-DETR workflows: inference, fine-tuning, evaluation, and ONNX export. It is the mandatory prerequisite for every Implementation in the repository. GPU acceleration (CUDA) is strongly recommended for training; CPU mode is functional but impractical for training due to speed.

System Requirements

Category	Requirement	Notes
OS	Linux (POSIX/Unix), macOS	Windows not officially listed in classifiers; use WSL2
Python	>= 3.10, <= 3.13	Declared in `pyproject.toml` `requires-python`
Hardware	NVIDIA GPU (recommended)	CUDA support auto-detected; MPS for Apple Silicon; CPU fallback
VRAM	8GB minimum (16GB+ recommended)	See memory configurations in training docs
Disk	Sufficient for model weights	Base model ~120MB; large models up to ~500MB

Dependencies

System Packages

CUDA toolkit (if using NVIDIA GPU)
C++ compiler (for PyTorch extensions)

Python Packages

`torch` >= 1.13.0, <= 2.8.0
`torchvision` >= 0.14.0
`transformers` > 4.0.0, < 5.0.0
`peft` (any version)
`pydantic` (any version)
`scipy` (any version)
`numpy` (any version)
`tqdm` (any version)
`pycocotools` (any version)
`supervision` (any version)
`matplotlib` (any version)
`roboflow` (any version)
`polygraphy` (any version)
`rf100vl` (any version)
`pillow-avif-plugin` < 1.5.3

Optional Packages

`tensorboard` >= 2.13.0 (for metrics logging via `rfdetr[metrics]`)
`wandb` (for W&B logging via `rfdetr[metrics]`)

Credentials

No credentials are required for the core environment. See Environment:Roboflow_Rf_detr_Roboflow_Deployment_Credentials for deployment-specific credentials.

Quick Install

# Install RF-DETR with all core dependencies
pip install rfdetr

# For metrics logging (TensorBoard + W&B)
pip install "rfdetr[metrics]"

# For ONNX export
pip install "rfdetr[onnxexport]"

Code Evidence

Device auto-detection from `rfdetr/config.py:14`:

DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

Float32 matmul precision optimization from `rfdetr/detr.py:25-28`:

try:
    torch.set_float32_matmul_precision('high')
except:
    pass

PyTorch version constraint from `pyproject.toml:39`:

"torch>=1.13.0,<=2.8.0",  # TODO: Torch >=2.9.0 is excluded due to known issues.

AMP compatibility handling from `rfdetr/engine.py:31-36`:

try:
    from torch.amp import GradScaler, autocast
    DEPRECATED_AMP = False
except ImportError:
    from torch.cuda.amp import GradScaler, autocast
    DEPRECATED_AMP = True

Distributed mode initialization from `rfdetr/util/misc.py:432-454`:

def init_distributed_mode(args):
    if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
        args.rank = int(os.environ["RANK"])
        args.world_size = int(os.environ['WORLD_SIZE'])
        args.gpu = int(os.environ['LOCAL_RANK'])
    elif 'SLURM_PROCID' in os.environ:
        args.rank = int(os.environ['SLURM_PROCID'])
        args.gpu = args.rank % torch.cuda.device_count()

Common Errors

Error Message	Cause	Solution
`CUDA out of memory`	Insufficient GPU VRAM for batch size/resolution	Reduce `batch_size`, enable `gradient_checkpointing=True`, or reduce `resolution`
`ImportError: torch.amp`	Older PyTorch version without new AMP API	Update PyTorch >= 2.0 or use the fallback `torch.cuda.amp` (handled automatically)
`RuntimeError: spawn` workers	Multiprocessing on Windows/macOS without `__main__` guard	Wrap training code in `if __name__ == '__main__':` block
Failed to load pretrain weights	Corrupted weight download	Weights are auto-redownloaded on corruption; check network connectivity

Compatibility Notes

CUDA GPUs: Fully supported. Default device. NCCL backend used for distributed training.
Apple MPS: Supported for inference. Training may have limited functionality.
CPU: Supported but impractical for training. Functional for inference on small batches.
Distributed Training: Supports PyTorch DDP via `torch.distributed.launch` or `torchrun`. Supports SLURM via `SLURM_PROCID` environment variable.
PyTorch Version: Torch >= 2.9.0 is explicitly excluded due to known issues. PRs to lift this restriction are welcome.
pillow-avif-plugin: Pinned below 1.5.3 due to broken wheel in CI.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment