Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LaurentMazare Tch rs YOLO Detection

From Leeroopedia


Knowledge Sources
Domains Object Detection, Computer Vision, Image Processing
Last Updated 2026-02-08 00:00 GMT

Overview

Performs YOLO v3 object detection on input images, including model loading, confidence thresholding, non-maximum suppression, and bounding box annotation.

Description

This module implements the full YOLO v3 inference pipeline as a command-line application. It loads a Darknet configuration and pre-trained weights, processes one or more input images, and produces annotated output images with detected object bounding boxes.

The pipeline consists of several stages:

Image preprocessing: Input images are loaded and resized to the network's expected dimensions (specified in the .cfg file). The resized image is normalized to [0, 1] range and batched.

Detection: The model produces raw predictions which are processed by the report function. Predictions are filtered by a confidence threshold of 0.5, and each detection is assigned to the class with the highest score. Bounding boxes are extracted in center-format (x, y, w, h) and converted to corner-format (xmin, ymin, xmax, ymax).

Non-maximum suppression (NMS): For each class, detections are sorted by confidence in descending order. Each detection is compared against all previously accepted detections using Intersection over Union (IoU). Detections with IoU exceeding the NMS threshold of 0.4 against any accepted detection are suppressed.

Annotation: Surviving bounding boxes are drawn on the original (non-resized) image by scaling coordinates back to original dimensions. The draw_rect function paints blue (RGB 0,0,1) rectangles using 2-pixel wide borders. Class names are printed to stdout using COCO class labels.

Annotated images are saved as output-{index:05}.jpg files.

Usage

Use this module for running YOLO v3 object detection on images from the command line. It requires the Darknet configuration file (yolo-v3.cfg), pre-trained weights in .ot format, and one or more input images as arguments.

Code Reference

Source Location

Signature

#[derive(Debug, Clone, Copy)]
struct Bbox {
    xmin: f64,
    ymin: f64,
    xmax: f64,
    ymax: f64,
    confidence: f64,
}

fn iou(b1: &Bbox, b2: &Bbox) -> f64
pub fn draw_rect(t: &mut Tensor, x1: i64, x2: i64, y1: i64, y2: i64)
pub fn report(pred: &Tensor, img: &Tensor, w: i64, h: i64) -> Result<Tensor>
pub fn main() -> Result<()>

Import

use anyhow::{ensure, Result};
use tch::nn::ModuleT;
use tch::vision::image;
use tch::Tensor;

I/O Contract

Input Type Description
args[1] Path Path to pre-trained weights file (.ot format)
args[2..] Paths One or more input image file paths
CONFIG_NAME &str Path to Darknet config ("examples/yolo/yolo-v3.cfg")
CONFIDENCE_THRESHOLD f64 Minimum objectness confidence (0.5)
NMS_THRESHOLD f64 IoU threshold for non-maximum suppression (0.4)
Output Type Description
report return Tensor Annotated image tensor with bounding boxes drawn, shape [3, H, W]
iou return f64 Intersection over Union between two bounding boxes
Saved images JPEG files output-00000.jpg, output-00001.jpg, etc.
Stdout Text Class name and bounding box details for each detection
Bbox Field Type Description
xmin, ymin f64 Top-left corner coordinates (in network input space)
xmax, ymax f64 Bottom-right corner coordinates (in network input space)
confidence f64 Objectness confidence score

Usage Examples

// Command-line usage:
// cargo run --example yolo -- yolo-v3.ot photo1.jpg photo2.jpg

// Programmatic usage:
use tch::nn::ModuleT;
use tch::vision::image;

// Load model
let mut vs = tch::nn::VarStore::new(tch::Device::Cpu);
let darknet = darknet::parse_config("examples/yolo/yolo-v3.cfg")?;
let model = darknet.build_model(&vs.root())?;
vs.load("yolo-v3.ot")?;

// Process an image
let original_image = image::load("photo.jpg")?;
let net_width = darknet.width()?;
let net_height = darknet.height()?;
let resized = image::resize(&original_image, net_width, net_height)?;
let input = resized.unsqueeze(0).to_kind(tch::Kind::Float) / 255.;

// Run detection
let predictions = model.forward_t(&input, false).squeeze();

// Apply confidence filtering, NMS, and draw bounding boxes
let annotated = report(&predictions, &original_image, net_width, net_height)?;
image::save(&annotated, "output-00000.jpg")?;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment