Implementation:Roboflow Rf detr Deploy Benchmark

Knowledge Sources	Roboflow_Rf_detr ONNX Runtime TensorRT
Domains	Deployment, Benchmarking, Object_Detection
Last Updated	2026-02-08 15:00 GMT

Overview

Standalone benchmarking tool that measures inference latency and COCO mAP accuracy for exported ONNX and TensorRT engine models on GPU hardware.

Description

The deploy benchmark module provides a complete pipeline for evaluating exported RF-DETR models in ONNX or TensorRT format. It loads COCO validation images, preprocesses them through a square-resize-and-normalize pipeline, runs inference through either ONNX Runtime (CUDA provider) or a TensorRT engine, post-processes detections (sigmoid + top-300 selection + coordinate scaling), and computes standard COCO evaluation metrics. The TRTInference class wraps TensorRT engine loading, binding allocation, and synchronous/asynchronous execution. A TimeProfiler context manager handles CUDA-synchronized wall-clock timing. The module also includes build_engine for converting ONNX models to TensorRT FP16 engines.

Usage

Use this tool after exporting an RF-DETR model to ONNX or TensorRT format to verify that accuracy is preserved and that latency meets deployment requirements. Run with --run_benchmark for reliable latency measurement (10x repeat per image) or --disable_eval for pure speed testing.

Code Reference

Source Location

Repository: Roboflow_Rf_detr
File: rfdetr/deploy/benchmark.py
Lines: 1-579

Signature

class TRTInference(object):
    """TensorRT inference engine."""
    def __init__(
        self,
        engine_path: str = 'dino.engine',
        device: str = 'cuda:0',
        sync_mode: bool = False,
        max_batch_size: int = 32,
        verbose: bool = False,
    ):
        ...

    def __call__(self, blob: dict) -> dict:
        """Run inference on input blob."""
        ...

    def build_engine(
        self,
        onnx_file_path: str,
        engine_file_path: str,
        max_batch_size: int = 32,
    ) -> bytes:
        """Convert ONNX model to TensorRT engine with FP16."""
        ...

class CocoEvaluator(object):
    def __init__(self, coco_gt: str, iou_types: tuple):
        ...

def infer_onnx(
    sess,
    coco_evaluator: CocoEvaluator,
    time_profile: TimeProfiler,
    prefix: str,
    img_list: list,
    device: str,
    repeats: int = 1,
) -> None:
    """Run ONNX Runtime inference over COCO images."""
    ...

def infer_engine(
    model: TRTInference,
    coco_evaluator: CocoEvaluator,
    time_profile: TimeProfiler,
    prefix: str,
    img_list: list,
    device: str,
    repeats: int = 1,
) -> None:
    """Run TensorRT inference over COCO images."""
    ...

def main(args) -> None:
    """Entry point: dispatch to ONNX or TRT benchmark."""
    ...

Import

from rfdetr.deploy.benchmark import TRTInference, CocoEvaluator, TimeProfiler, main

I/O Contract

Inputs

Name	Type	Required	Description
args.path	str	Yes	Path to ONNX (`.onnx`) or TensorRT (`.engine`) model file
args.coco_path	str	Yes	Path to COCO dataset root (expects `annotations/instances_val2017.json` and `val2017/`)
args.device	int	No	CUDA device index (default: 0)
args.run_benchmark	bool	No	If set, repeats inference 10x per image for reliable latency
args.disable_eval	bool	No	If set, skips COCO mAP evaluation

Outputs

Name	Type	Description
latency	stdout	Average inference latency in milliseconds printed to console
coco_eval_bbox	stdout/dict	Standard COCO evaluation metrics (mAP, mAP@50, etc.) if eval is enabled

Usage Examples

CLI: Benchmark ONNX Model

python -m rfdetr.deploy.benchmark \
    --path model.onnx \
    --coco_path data/coco \
    --run_benchmark

CLI: Benchmark TensorRT Engine

python -m rfdetr.deploy.benchmark \
    --path model.engine \
    --coco_path data/coco \
    --device 0 \
    --run_benchmark

CLI: Latency Only (Skip Evaluation)

python -m rfdetr.deploy.benchmark \
    --path model.onnx \
    --coco_path data/coco \
    --run_benchmark \
    --disable_eval

Programmatic: TensorRT Inference

from rfdetr.deploy.benchmark import TRTInference, TimeProfiler
import torch

# Load TensorRT engine
engine = TRTInference(
    engine_path="model.engine",
    device="cuda:0",
    sync_mode=True,
)

# Run inference on a preprocessed input tensor
input_tensor = torch.randn(1, 3, 640, 640).cuda()
outputs = engine({"input": input_tensor})
# outputs["labels"]: detection logits
# outputs["dets"]: detection bounding boxes (cxcywh, normalized)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment