Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Roboflow Rf detr Torchvision Transforms For Detection

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Preprocessing
Last Updated 2026-02-08 15:00 GMT

Overview

Wrapper for torchvision functional transforms used within RF-DETR's predict pipeline to preprocess images for detection inference.

Description

RF-DETR uses torchvision.transforms.functional (imported as F) directly inside the RFDETR.predict() method rather than a transforms pipeline. The three functions applied in sequence are F.to_tensor, F.normalize, and F.resize. This inline approach handles multiple input formats (file path, PIL, numpy, tensor) and preserves original sizes for post-processing.

Usage

These transforms are called automatically within RFDETR.predict(). Users do not typically call them directly unless implementing a custom inference pipeline.

Code Reference

Source Location

  • Repository: rf-detr
  • File: rfdetr/detr.py
  • Lines: L310-337 (inline in predict() method)

Signature

# Applied inline within RFDETR.predict():
import torchvision.transforms.functional as F

# Step 1: Convert to tensor (if not already)
img_tensor = F.to_tensor(img)  # -> Tensor [C, H, W] in [0, 1]

# Step 2: Normalize with ImageNet statistics
img_tensor = F.normalize(img_tensor, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# Step 3: Resize to model resolution
img_tensor = F.resize(img_tensor, (resolution, resolution))

Import

import torchvision.transforms.functional as F

I/O Contract

Inputs

Name Type Required Description
img Union[str, PIL.Image, np.ndarray, torch.Tensor] Yes Input image in RGB order
means List[float] No ImageNet normalization means [0.485, 0.456, 0.406]
stds List[float] No ImageNet normalization stds [0.229, 0.224, 0.225]
resolution int Yes Target square resolution from model config

Outputs

Name Type Description
batch_tensor torch.Tensor Batched tensor of shape (B, 3, resolution, resolution), normalized and resized
orig_sizes List[Tuple[int, int]] Original (H, W) for each image, used by PostProcess

Usage Examples

Automatic Preprocessing in predict()

from rfdetr import RFDETRBase

model = RFDETRBase()

# All preprocessing is handled internally
detections = model.predict("image.jpg")

Manual Preprocessing

import torchvision.transforms.functional as F
from PIL import Image

img = Image.open("image.jpg")
tensor = F.to_tensor(img)
tensor = F.normalize(tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
tensor = F.resize(tensor, (560, 560))
batch = tensor.unsqueeze(0)  # Add batch dimension

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment