Implementation:Roboflow Rf detr Torchvision Transforms For Detection
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Preprocessing |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Wrapper for torchvision functional transforms used within RF-DETR's predict pipeline to preprocess images for detection inference.
Description
RF-DETR uses torchvision.transforms.functional (imported as F) directly inside the RFDETR.predict() method rather than a transforms pipeline. The three functions applied in sequence are F.to_tensor, F.normalize, and F.resize. This inline approach handles multiple input formats (file path, PIL, numpy, tensor) and preserves original sizes for post-processing.
Usage
These transforms are called automatically within RFDETR.predict(). Users do not typically call them directly unless implementing a custom inference pipeline.
Code Reference
Source Location
- Repository: rf-detr
- File: rfdetr/detr.py
- Lines: L310-337 (inline in predict() method)
Signature
# Applied inline within RFDETR.predict():
import torchvision.transforms.functional as F
# Step 1: Convert to tensor (if not already)
img_tensor = F.to_tensor(img) # -> Tensor [C, H, W] in [0, 1]
# Step 2: Normalize with ImageNet statistics
img_tensor = F.normalize(img_tensor, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# Step 3: Resize to model resolution
img_tensor = F.resize(img_tensor, (resolution, resolution))
Import
import torchvision.transforms.functional as F
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| img | Union[str, PIL.Image, np.ndarray, torch.Tensor] | Yes | Input image in RGB order |
| means | List[float] | No | ImageNet normalization means [0.485, 0.456, 0.406] |
| stds | List[float] | No | ImageNet normalization stds [0.229, 0.224, 0.225] |
| resolution | int | Yes | Target square resolution from model config |
Outputs
| Name | Type | Description |
|---|---|---|
| batch_tensor | torch.Tensor | Batched tensor of shape (B, 3, resolution, resolution), normalized and resized |
| orig_sizes | List[Tuple[int, int]] | Original (H, W) for each image, used by PostProcess |
Usage Examples
Automatic Preprocessing in predict()
from rfdetr import RFDETRBase
model = RFDETRBase()
# All preprocessing is handled internally
detections = model.predict("image.jpg")
Manual Preprocessing
import torchvision.transforms.functional as F
from PIL import Image
img = Image.open("image.jpg")
tensor = F.to_tensor(img)
tensor = F.normalize(tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
tensor = F.resize(tensor, (560, 560))
batch = tensor.unsqueeze(0) # Add batch dimension