Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Roboflow Rf detr Python GPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Computer_Vision, Deep_Learning
Last Updated 2026-02-08 15:00 GMT

Overview

Python 3.10+ environment with PyTorch (1.13–2.8), CUDA/MPS/CPU device support, DINOv2 backbone via HuggingFace Transformers, and LoRA via PEFT for object detection training and inference.

Description

This environment provides the core runtime for RF-DETR, a real-time object detection model based on DETR with a DINOv2 backbone. The system auto-detects available hardware at import time (CUDA GPU, Apple MPS, or CPU fallback) and sets the default device accordingly. Training requires PyTorch with mixed-precision support (bfloat16 via `torch.amp`), the HuggingFace Transformers library for the DINOv2 windowed attention backbone, and PEFT for optional LoRA fine-tuning of the encoder.

Usage

Use this environment for all RF-DETR workflows: inference, fine-tuning, evaluation, and ONNX export. It is the mandatory prerequisite for every Implementation in the repository. GPU acceleration (CUDA) is strongly recommended for training; CPU mode is functional but impractical for training due to speed.

System Requirements

Category Requirement Notes
OS Linux (POSIX/Unix), macOS Windows not officially listed in classifiers; use WSL2
Python >= 3.10, <= 3.13 Declared in `pyproject.toml` `requires-python`
Hardware NVIDIA GPU (recommended) CUDA support auto-detected; MPS for Apple Silicon; CPU fallback
VRAM 8GB minimum (16GB+ recommended) See memory configurations in training docs
Disk Sufficient for model weights Base model ~120MB; large models up to ~500MB

Dependencies

System Packages

  • CUDA toolkit (if using NVIDIA GPU)
  • C++ compiler (for PyTorch extensions)

Python Packages

  • `torch` >= 1.13.0, <= 2.8.0
  • `torchvision` >= 0.14.0
  • `transformers` > 4.0.0, < 5.0.0
  • `peft` (any version)
  • `pydantic` (any version)
  • `scipy` (any version)
  • `numpy` (any version)
  • `tqdm` (any version)
  • `pycocotools` (any version)
  • `supervision` (any version)
  • `matplotlib` (any version)
  • `roboflow` (any version)
  • `polygraphy` (any version)
  • `rf100vl` (any version)
  • `pillow-avif-plugin` < 1.5.3

Optional Packages

  • `tensorboard` >= 2.13.0 (for metrics logging via `rfdetr[metrics]`)
  • `wandb` (for W&B logging via `rfdetr[metrics]`)

Credentials

No credentials are required for the core environment. See Environment:Roboflow_Rf_detr_Roboflow_Deployment_Credentials for deployment-specific credentials.

Quick Install

# Install RF-DETR with all core dependencies
pip install rfdetr

# For metrics logging (TensorBoard + W&B)
pip install "rfdetr[metrics]"

# For ONNX export
pip install "rfdetr[onnxexport]"

Code Evidence

Device auto-detection from `rfdetr/config.py:14`:

DEVICE = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu"

Float32 matmul precision optimization from `rfdetr/detr.py:25-28`:

try:
    torch.set_float32_matmul_precision('high')
except:
    pass

PyTorch version constraint from `pyproject.toml:39`:

"torch>=1.13.0,<=2.8.0",  # TODO: Torch >=2.9.0 is excluded due to known issues.

AMP compatibility handling from `rfdetr/engine.py:31-36`:

try:
    from torch.amp import GradScaler, autocast
    DEPRECATED_AMP = False
except ImportError:
    from torch.cuda.amp import GradScaler, autocast
    DEPRECATED_AMP = True

Distributed mode initialization from `rfdetr/util/misc.py:432-454`:

def init_distributed_mode(args):
    if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
        args.rank = int(os.environ["RANK"])
        args.world_size = int(os.environ['WORLD_SIZE'])
        args.gpu = int(os.environ['LOCAL_RANK'])
    elif 'SLURM_PROCID' in os.environ:
        args.rank = int(os.environ['SLURM_PROCID'])
        args.gpu = args.rank % torch.cuda.device_count()

Common Errors

Error Message Cause Solution
`CUDA out of memory` Insufficient GPU VRAM for batch size/resolution Reduce `batch_size`, enable `gradient_checkpointing=True`, or reduce `resolution`
`ImportError: torch.amp` Older PyTorch version without new AMP API Update PyTorch >= 2.0 or use the fallback `torch.cuda.amp` (handled automatically)
`RuntimeError: spawn` workers Multiprocessing on Windows/macOS without `__main__` guard Wrap training code in `if __name__ == '__main__':` block
Failed to load pretrain weights Corrupted weight download Weights are auto-redownloaded on corruption; check network connectivity

Compatibility Notes

  • CUDA GPUs: Fully supported. Default device. NCCL backend used for distributed training.
  • Apple MPS: Supported for inference. Training may have limited functionality.
  • CPU: Supported but impractical for training. Functional for inference on small batches.
  • Distributed Training: Supports PyTorch DDP via `torch.distributed.launch` or `torchrun`. Supports SLURM via `SLURM_PROCID` environment variable.
  • PyTorch Version: Torch >= 2.9.0 is explicitly excluded due to known issues. PRs to lift this restriction are welcome.
  • pillow-avif-plugin: Pinned below 1.5.3 due to broken wheel in CI.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment