Implementation:LaurentMazare Tch rs YOLO Detection
| Knowledge Sources | |
|---|---|
| Domains | Object Detection, Computer Vision, Image Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Performs YOLO v3 object detection on input images, including model loading, confidence thresholding, non-maximum suppression, and bounding box annotation.
Description
This module implements the full YOLO v3 inference pipeline as a command-line application. It loads a Darknet configuration and pre-trained weights, processes one or more input images, and produces annotated output images with detected object bounding boxes.
The pipeline consists of several stages:
Image preprocessing: Input images are loaded and resized to the network's expected dimensions (specified in the .cfg file). The resized image is normalized to [0, 1] range and batched.
Detection: The model produces raw predictions which are processed by the report function. Predictions are filtered by a confidence threshold of 0.5, and each detection is assigned to the class with the highest score. Bounding boxes are extracted in center-format (x, y, w, h) and converted to corner-format (xmin, ymin, xmax, ymax).
Non-maximum suppression (NMS): For each class, detections are sorted by confidence in descending order. Each detection is compared against all previously accepted detections using Intersection over Union (IoU). Detections with IoU exceeding the NMS threshold of 0.4 against any accepted detection are suppressed.
Annotation: Surviving bounding boxes are drawn on the original (non-resized) image by scaling coordinates back to original dimensions. The draw_rect function paints blue (RGB 0,0,1) rectangles using 2-pixel wide borders. Class names are printed to stdout using COCO class labels.
Annotated images are saved as output-{index:05}.jpg files.
Usage
Use this module for running YOLO v3 object detection on images from the command line. It requires the Darknet configuration file (yolo-v3.cfg), pre-trained weights in .ot format, and one or more input images as arguments.
Code Reference
Source Location
- Repository: LaurentMazare_Tch_rs
- File: examples/yolo/main.rs
- Lines: 1-153
Signature
#[derive(Debug, Clone, Copy)]
struct Bbox {
xmin: f64,
ymin: f64,
xmax: f64,
ymax: f64,
confidence: f64,
}
fn iou(b1: &Bbox, b2: &Bbox) -> f64
pub fn draw_rect(t: &mut Tensor, x1: i64, x2: i64, y1: i64, y2: i64)
pub fn report(pred: &Tensor, img: &Tensor, w: i64, h: i64) -> Result<Tensor>
pub fn main() -> Result<()>
Import
use anyhow::{ensure, Result};
use tch::nn::ModuleT;
use tch::vision::image;
use tch::Tensor;
I/O Contract
| Input | Type | Description |
|---|---|---|
| args[1] | Path | Path to pre-trained weights file (.ot format) |
| args[2..] | Paths | One or more input image file paths |
| CONFIG_NAME | &str | Path to Darknet config ("examples/yolo/yolo-v3.cfg") |
| CONFIDENCE_THRESHOLD | f64 | Minimum objectness confidence (0.5) |
| NMS_THRESHOLD | f64 | IoU threshold for non-maximum suppression (0.4) |
| Output | Type | Description |
|---|---|---|
| report return | Tensor | Annotated image tensor with bounding boxes drawn, shape [3, H, W] |
| iou return | f64 | Intersection over Union between two bounding boxes |
| Saved images | JPEG files | output-00000.jpg, output-00001.jpg, etc. |
| Stdout | Text | Class name and bounding box details for each detection |
| Bbox Field | Type | Description |
|---|---|---|
| xmin, ymin | f64 | Top-left corner coordinates (in network input space) |
| xmax, ymax | f64 | Bottom-right corner coordinates (in network input space) |
| confidence | f64 | Objectness confidence score |
Usage Examples
// Command-line usage:
// cargo run --example yolo -- yolo-v3.ot photo1.jpg photo2.jpg
// Programmatic usage:
use tch::nn::ModuleT;
use tch::vision::image;
// Load model
let mut vs = tch::nn::VarStore::new(tch::Device::Cpu);
let darknet = darknet::parse_config("examples/yolo/yolo-v3.cfg")?;
let model = darknet.build_model(&vs.root())?;
vs.load("yolo-v3.ot")?;
// Process an image
let original_image = image::load("photo.jpg")?;
let net_width = darknet.width()?;
let net_height = darknet.height()?;
let resized = image::resize(&original_image, net_width, net_height)?;
let input = resized.unsqueeze(0).to_kind(tch::Kind::Float) / 255.;
// Run detection
let predictions = model.forward_t(&input, false).squeeze();
// Apply confidence filtering, NMS, and draw bounding boxes
let annotated = report(&predictions, &original_image, net_width, net_height)?;
image::save(&annotated, "output-00000.jpg")?;