Implementation:Roboflow Rf detr Torchvision Transforms For Detection

Knowledge Sources	RF-DETR Torchvision Functional
Domains	Computer_Vision, Preprocessing
Last Updated	2026-02-08 15:00 GMT

Overview

Wrapper for torchvision functional transforms used within RF-DETR's predict pipeline to preprocess images for detection inference.

Description

RF-DETR uses torchvision.transforms.functional (imported as F) directly inside the RFDETR.predict() method rather than a transforms pipeline. The three functions applied in sequence are F.to_tensor, F.normalize, and F.resize. This inline approach handles multiple input formats (file path, PIL, numpy, tensor) and preserves original sizes for post-processing.

Usage

These transforms are called automatically within RFDETR.predict(). Users do not typically call them directly unless implementing a custom inference pipeline.

Code Reference

Source Location

Repository: rf-detr
File: rfdetr/detr.py
Lines: L310-337 (inline in predict() method)

Signature

# Applied inline within RFDETR.predict():
import torchvision.transforms.functional as F

# Step 1: Convert to tensor (if not already)
img_tensor = F.to_tensor(img)  # -> Tensor [C, H, W] in [0, 1]

# Step 2: Normalize with ImageNet statistics
img_tensor = F.normalize(img_tensor, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# Step 3: Resize to model resolution
img_tensor = F.resize(img_tensor, (resolution, resolution))

Import

import torchvision.transforms.functional as F

I/O Contract

Inputs

Name	Type	Required	Description
img	Union[str, PIL.Image, np.ndarray, torch.Tensor]	Yes	Input image in RGB order
means	List[float]	No	ImageNet normalization means [0.485, 0.456, 0.406]
stds	List[float]	No	ImageNet normalization stds [0.229, 0.224, 0.225]
resolution	int	Yes	Target square resolution from model config

Outputs

Name	Type	Description
batch_tensor	torch.Tensor	Batched tensor of shape (B, 3, resolution, resolution), normalized and resized
orig_sizes	List[Tuple[int, int]]	Original (H, W) for each image, used by PostProcess

Usage Examples

Automatic Preprocessing in predict()

from rfdetr import RFDETRBase

model = RFDETRBase()

# All preprocessing is handled internally
detections = model.predict("image.jpg")

Manual Preprocessing

import torchvision.transforms.functional as F
from PIL import Image

img = Image.open("image.jpg")
tensor = F.to_tensor(img)
tensor = F.normalize(tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
tensor = F.resize(tensor, (560, 560))
batch = tensor.unsqueeze(0)  # Add batch dimension

Related Pages

Implements Principle

Principle:Roboflow_Rf_detr_Image_Preprocessing

Requires Environment

Environment:Roboflow_Rf_detr_Python_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment