Workflow:Roboflow Rf detr Custom Dataset Finetuning

Knowledge Sources	RF-DETR RF-DETR Docs Fine-tuning RF-DETR RF-DETR Paper
Domains	Computer_Vision, Object_Detection, Fine_Tuning, Training
Last Updated	2026-02-08 15:00 GMT

Overview

End-to-end process for fine-tuning a pretrained RF-DETR object detection or instance segmentation model on a custom dataset using COCO or YOLO format annotations.

Description

This workflow covers the complete training pipeline for adapting RF-DETR models to domain-specific detection tasks. It starts from a COCO-pretrained checkpoint and fine-tunes on a user-provided dataset, leveraging the DINOv2 vision transformer backbone with optional LoRA adapters. The training loop includes gradient accumulation for memory efficiency, Exponential Moving Average (EMA) weight smoothing, cosine or step learning rate scheduling with warmup, multi-scale training augmentation, automatic mixed precision (bfloat16), COCO-style evaluation with mAP metrics, early stopping, and checkpoint management. Both object detection and instance segmentation tasks are supported.

Usage

Execute this workflow when you have a labeled dataset (in COCO JSON or YOLO format) and need to train an RF-DETR model to detect domain-specific objects. This workflow is appropriate for datasets ranging from a few hundred to tens of thousands of images. It supports training on single or multiple GPUs via PyTorch Distributed Data Parallel (DDP). Use this when pretrained COCO weights do not cover your target object classes, or when you need higher accuracy on a specific domain.

Execution Steps

Step 1: Prepare Dataset

Organize your labeled dataset into the expected directory structure. RF-DETR automatically detects whether the dataset is in COCO or YOLO format. For COCO format, the dataset directory should contain train/, valid/, and optionally test/ subdirectories, each with images and a _annotations.coco.json file. For YOLO format, the directory should contain a data.yaml file defining class names and paths, with train/images/, valid/images/ subdirectories and corresponding labels/ directories.

Key considerations:

COCO format is detected by the presence of train/_annotations.coco.json
YOLO format is detected by the presence of data.yaml plus train/images/
Roboflow can export datasets in either format; the dataset_file parameter defaults to "roboflow"
Class names are extracted automatically from annotations (COCO JSON categories or YOLO data.yaml names field)

Step 2: Select Model Architecture

Choose the appropriate RF-DETR model size and task type. For object detection, use RFDETRNano through RFDETRLarge. For instance segmentation, use RFDETRSegNano through RFDETRSegLarge. Each model size has a preconfigured architecture (encoder dimensions, decoder layers, resolution, patch size) and ships with COCO-pretrained weights.

Key considerations:

Larger models achieve higher accuracy but require more GPU memory and training time
Segmentation models use patch_size=12 and include an additional segmentation head
For memory-constrained environments, use gradient_checkpointing=True to reduce VRAM usage by approximately 30-40%

Step 3: Initialize Model and Configure Training

Instantiate the model class, which downloads pretrained weights and builds the architecture. Then configure the training hyperparameters: learning rate, batch size, gradient accumulation steps, number of epochs, output directory, and optional features like early stopping, TensorBoard or Weights & Biases logging.

Key considerations:

The detection head is automatically reinitialized when the number of dataset classes differs from the pretrained checkpoint
Maintain an effective batch size of 16 by adjusting batch_size and grad_accum_steps for your GPU memory
The default learning rate (1e-4 for decoder, 1.5e-4 for encoder) works well for fine-tuning from COCO weights
Layer-wise learning rate decay (default 0.8) applies progressively smaller learning rates to earlier backbone layers

Step 4: Execute Training Loop

Call the train() method to start the training loop. Internally, this builds the dataset loaders with augmentation transforms (random resize, crop, horizontal flip, color jitter), constructs the optimizer (AdamW) with layer-wise learning rate decay, sets up the learning rate scheduler (cosine annealing or step decay with optional warmup), and iterates through epochs. Each epoch consists of a training pass with gradient accumulation and AMP, followed by COCO-style evaluation on the validation set. EMA weights are updated after each training step when enabled.

Key considerations:

Multi-scale training randomly resizes input during training for improved robustness
Gradient clipping (max norm 0.1) prevents training instability
The training loop supports distributed training across multiple GPUs via PyTorch DDP
Callbacks fire at epoch end for metrics logging and early stopping evaluation

Step 5: Monitor and Evaluate

After each training epoch, the model is evaluated on the validation set using COCO metrics (mAP@50, mAP@50:95) plus extended precision, recall, and F1 metrics via confidence threshold sweeping. Training logs are saved to log.txt in the output directory. Optionally, metrics are streamed to TensorBoard or Weights & Biases for real-time visualization.

Key considerations:

Both regular and EMA model variants are evaluated; the best overall checkpoint is selected automatically
Early stopping monitors validation mAP and halts training after a configurable patience period with no improvement
Metric plots are saved as images in the output directory

Step 6: Save and Select Best Checkpoint

The training loop saves multiple checkpoint types: periodic checkpoints (every N epochs), the best regular model checkpoint, the best EMA checkpoint, and the final best overall checkpoint. After training completes, the best checkpoint (chosen as the better of EMA and regular models) is stripped of optimizer state for efficient deployment and saved as checkpoint_best_total.pth.

Key considerations:

checkpoint_best_total.pth contains only model weights and is optimized for inference
Training checkpoints include full state (optimizer, scheduler, epoch) for resuming training
The results.json file contains per-class mAP, precision, recall, and F1 scores
If run_test is enabled (default), the best model is also evaluated on the test set

Step 7: Load and Run Fine-tuned Model

After training, load the best checkpoint by passing its path as pretrain_weights when instantiating a new model object. The model automatically loads class names from the checkpoint and reinitializes the detection head for the correct number of classes. Predictions can then be run on new images using the predict() method.

Key considerations:

Use pretrain_weights to initialize from a checkpoint for inference
Use resume to continue training from a checkpoint with optimizer state preserved
The model stores class names in the checkpoint for automatic label resolution

Execution Diagram

GitHub URL

Workflow Repository