Implementation:Roboflow Rf detr Deploy Benchmark
| Knowledge Sources | |
|---|---|
| Domains | Deployment, Benchmarking, Object_Detection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Standalone benchmarking tool that measures inference latency and COCO mAP accuracy for exported ONNX and TensorRT engine models on GPU hardware.
Description
The deploy benchmark module provides a complete pipeline for evaluating exported RF-DETR models in ONNX or TensorRT format. It loads COCO validation images, preprocesses them through a square-resize-and-normalize pipeline, runs inference through either ONNX Runtime (CUDA provider) or a TensorRT engine, post-processes detections (sigmoid + top-300 selection + coordinate scaling), and computes standard COCO evaluation metrics. The TRTInference class wraps TensorRT engine loading, binding allocation, and synchronous/asynchronous execution. A TimeProfiler context manager handles CUDA-synchronized wall-clock timing. The module also includes build_engine for converting ONNX models to TensorRT FP16 engines.
Usage
Use this tool after exporting an RF-DETR model to ONNX or TensorRT format to verify that accuracy is preserved and that latency meets deployment requirements. Run with --run_benchmark for reliable latency measurement (10x repeat per image) or --disable_eval for pure speed testing.
Code Reference
Source Location
- Repository: Roboflow_Rf_detr
- File: rfdetr/deploy/benchmark.py
- Lines: 1-579
Signature
class TRTInference(object):
"""TensorRT inference engine."""
def __init__(
self,
engine_path: str = 'dino.engine',
device: str = 'cuda:0',
sync_mode: bool = False,
max_batch_size: int = 32,
verbose: bool = False,
):
...
def __call__(self, blob: dict) -> dict:
"""Run inference on input blob."""
...
def build_engine(
self,
onnx_file_path: str,
engine_file_path: str,
max_batch_size: int = 32,
) -> bytes:
"""Convert ONNX model to TensorRT engine with FP16."""
...
class CocoEvaluator(object):
def __init__(self, coco_gt: str, iou_types: tuple):
...
def infer_onnx(
sess,
coco_evaluator: CocoEvaluator,
time_profile: TimeProfiler,
prefix: str,
img_list: list,
device: str,
repeats: int = 1,
) -> None:
"""Run ONNX Runtime inference over COCO images."""
...
def infer_engine(
model: TRTInference,
coco_evaluator: CocoEvaluator,
time_profile: TimeProfiler,
prefix: str,
img_list: list,
device: str,
repeats: int = 1,
) -> None:
"""Run TensorRT inference over COCO images."""
...
def main(args) -> None:
"""Entry point: dispatch to ONNX or TRT benchmark."""
...
Import
from rfdetr.deploy.benchmark import TRTInference, CocoEvaluator, TimeProfiler, main
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args.path | str | Yes | Path to ONNX (.onnx) or TensorRT (.engine) model file
|
| args.coco_path | str | Yes | Path to COCO dataset root (expects annotations/instances_val2017.json and val2017/)
|
| args.device | int | No | CUDA device index (default: 0) |
| args.run_benchmark | bool | No | If set, repeats inference 10x per image for reliable latency |
| args.disable_eval | bool | No | If set, skips COCO mAP evaluation |
Outputs
| Name | Type | Description |
|---|---|---|
| latency | stdout | Average inference latency in milliseconds printed to console |
| coco_eval_bbox | stdout/dict | Standard COCO evaluation metrics (mAP, mAP@50, etc.) if eval is enabled |
Usage Examples
CLI: Benchmark ONNX Model
python -m rfdetr.deploy.benchmark \
--path model.onnx \
--coco_path data/coco \
--run_benchmark
CLI: Benchmark TensorRT Engine
python -m rfdetr.deploy.benchmark \
--path model.engine \
--coco_path data/coco \
--device 0 \
--run_benchmark
CLI: Latency Only (Skip Evaluation)
python -m rfdetr.deploy.benchmark \
--path model.onnx \
--coco_path data/coco \
--run_benchmark \
--disable_eval
Programmatic: TensorRT Inference
from rfdetr.deploy.benchmark import TRTInference, TimeProfiler
import torch
# Load TensorRT engine
engine = TRTInference(
engine_path="model.engine",
device="cuda:0",
sync_mode=True,
)
# Run inference on a preprocessed input tensor
input_tensor = torch.randn(1, 3, 640, 640).cuda()
outputs = engine({"input": input_tensor})
# outputs["labels"]: detection logits
# outputs["dets"]: detection bounding boxes (cxcywh, normalized)