Principle:Roboflow Rf detr Model Size Selection

Knowledge Sources	LW-DETR RF-DETR RF-DETR Docs
Domains	Object_Detection, Model_Architecture
Last Updated	2026-02-08 15:00 GMT

Overview

A design principle for selecting the appropriate model size variant to balance accuracy, speed, and resource constraints in object detection tasks.

Description

Model size selection involves choosing from a family of architecturally similar models that differ in computational complexity and detection performance. In the RF-DETR family, model variants range from Nano (fastest, least accurate) to Large (most accurate, highest compute). Each variant uses the same DINOv2-based backbone with a lightweight DETR decoder but varies in input resolution, number of decoder layers, and patch size.

The key trade-off axes are:

Input resolution: Higher resolution captures finer details but increases compute quadratically
Decoder depth: More layers improve detection quality but add latency
Backbone configuration: Window size and feature extraction layers affect the quality-speed balance

Usage

Use this principle when deploying an object detection system and you need to choose between detection accuracy and inference speed. Consider Nano/Small for edge deployment or real-time applications. Choose Base/Medium for balanced performance. Use Large when accuracy is paramount and compute resources are available.

Theoretical Basis

Model scaling follows the observation that detection performance improves with increased model capacity, but with diminishing returns. The RF-DETR family scales along three dimensions:

Resolution scaling: Input sizes range from 384px (Nano) to 704px (Large)
Depth scaling: Decoder layers range from 2 (Nano) to 4 (Large)
Width scaling: Hidden dimensions remain constant at 256, but attention heads vary

Each variant uses a Pydantic configuration class that encapsulates all architecture parameters, ensuring consistent and validated model construction.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment