Environment:Roboflow Rf detr Python GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Computer_Vision, Deep_Learning |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Python 3.10+ environment with PyTorch (1.13–2.8), CUDA/MPS/CPU device support, DINOv2 backbone via HuggingFace Transformers, and LoRA via PEFT for object detection training and inference.
Description
This environment provides the core runtime for RF-DETR, a real-time object detection model based on DETR with a DINOv2 backbone. The system auto-detects available hardware at import time (CUDA GPU, Apple MPS, or CPU fallback) and sets the default device accordingly. Training requires PyTorch with mixed-precision support (bfloat16 via `torch.amp`), the HuggingFace Transformers library for the DINOv2 windowed attention backbone, and PEFT for optional LoRA fine-tuning of the encoder.
Usage
Use this environment for all RF-DETR workflows: inference, fine-tuning, evaluation, and ONNX export. It is the mandatory prerequisite for every Implementation in the repository. GPU acceleration (CUDA) is strongly recommended for training; CPU mode is functional but impractical for training due to speed.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (POSIX/Unix), macOS | Windows not officially listed in classifiers; use WSL2 |
| Python | >= 3.10, <= 3.13 | Declared in `pyproject.toml` `requires-python` |
| Hardware | NVIDIA GPU (recommended) | CUDA support auto-detected; MPS for Apple Silicon; CPU fallback |
| VRAM | 8GB minimum (16GB+ recommended) | See memory configurations in training docs |
| Disk | Sufficient for model weights | Base model ~120MB; large models up to ~500MB |
Dependencies
System Packages
- CUDA toolkit (if using NVIDIA GPU)
- C++ compiler (for PyTorch extensions)
Python Packages
- `torch` >= 1.13.0, <= 2.8.0
- `torchvision` >= 0.14.0
- `transformers` > 4.0.0, < 5.0.0
- `peft` (any version)
- `pydantic` (any version)
- `scipy` (any version)
- `numpy` (any version)
- `tqdm` (any version)
- `pycocotools` (any version)
- `supervision` (any version)
- `matplotlib` (any version)
- `roboflow` (any version)
- `polygraphy` (any version)
- `rf100vl` (any version)
- `pillow-avif-plugin` < 1.5.3
Optional Packages
- `tensorboard` >= 2.13.0 (for metrics logging via `rfdetr[metrics]`)
- `wandb` (for W&B logging via `rfdetr[metrics]`)
Credentials
No credentials are required for the core environment. See Environment:Roboflow_Rf_detr_Roboflow_Deployment_Credentials for deployment-specific credentials.
Quick Install
# Install RF-DETR with all core dependencies
pip install rfdetr
# For metrics logging (TensorBoard + W&B)
pip install "rfdetr[metrics]"
# For ONNX export
pip install "rfdetr[onnxexport]"
Code Evidence
Device auto-detection from `rfdetr/config.py:14`:
DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"
Float32 matmul precision optimization from `rfdetr/detr.py:25-28`:
try:
torch.set_float32_matmul_precision('high')
except:
pass
PyTorch version constraint from `pyproject.toml:39`:
"torch>=1.13.0,<=2.8.0", # TODO: Torch >=2.9.0 is excluded due to known issues.
AMP compatibility handling from `rfdetr/engine.py:31-36`:
try:
from torch.amp import GradScaler, autocast
DEPRECATED_AMP = False
except ImportError:
from torch.cuda.amp import GradScaler, autocast
DEPRECATED_AMP = True
Distributed mode initialization from `rfdetr/util/misc.py:432-454`:
def init_distributed_mode(args):
if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
args.rank = int(os.environ["RANK"])
args.world_size = int(os.environ['WORLD_SIZE'])
args.gpu = int(os.environ['LOCAL_RANK'])
elif 'SLURM_PROCID' in os.environ:
args.rank = int(os.environ['SLURM_PROCID'])
args.gpu = args.rank % torch.cuda.device_count()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CUDA out of memory` | Insufficient GPU VRAM for batch size/resolution | Reduce `batch_size`, enable `gradient_checkpointing=True`, or reduce `resolution` |
| `ImportError: torch.amp` | Older PyTorch version without new AMP API | Update PyTorch >= 2.0 or use the fallback `torch.cuda.amp` (handled automatically) |
| `RuntimeError: spawn` workers | Multiprocessing on Windows/macOS without `__main__` guard | Wrap training code in `if __name__ == '__main__':` block |
| Failed to load pretrain weights | Corrupted weight download | Weights are auto-redownloaded on corruption; check network connectivity |
Compatibility Notes
- CUDA GPUs: Fully supported. Default device. NCCL backend used for distributed training.
- Apple MPS: Supported for inference. Training may have limited functionality.
- CPU: Supported but impractical for training. Functional for inference on small batches.
- Distributed Training: Supports PyTorch DDP via `torch.distributed.launch` or `torchrun`. Supports SLURM via `SLURM_PROCID` environment variable.
- PyTorch Version: Torch >= 2.9.0 is explicitly excluded due to known issues. PRs to lift this restriction are welcome.
- pillow-avif-plugin: Pinned below 1.5.3 due to broken wheel in CI.
Related Pages
- Implementation:Roboflow_Rf_detr_RFDETR_Size_Variants
- Implementation:Roboflow_Rf_detr_RFDETR_Init
- Implementation:Roboflow_Rf_detr_Torchvision_Transforms_For_Detection
- Implementation:Roboflow_Rf_detr_RFDETR_Predict
- Implementation:Roboflow_Rf_detr_Supervision_Annotators
- Implementation:Roboflow_Rf_detr_Build_Dataset
- Implementation:Roboflow_Rf_detr_RFDETR_Train_Config
- Implementation:Roboflow_Rf_detr_Model_Train
- Implementation:Roboflow_Rf_detr_Evaluate
- Implementation:Roboflow_Rf_detr_Best_Checkpoint_Selection
- Implementation:Roboflow_Rf_detr_RFDETR_Init_Finetuned
- Implementation:Roboflow_Rf_detr_RFDETR_Export
- Implementation:Roboflow_Rf_detr_RFDETR_Deploy_To_Roboflow
- Implementation:Roboflow_Rf_detr_Flop_Counter
- Implementation:Roboflow_Rf_detr_RFDETR_Platform_Models
- Implementation:Roboflow_Rf_detr_Save_GT_Predictions_Visualization