Implementation:LaurentMazare Tch rs Darknet Builder
| Knowledge Sources | |
|---|---|
| Domains | Object Detection, Model Architecture, Configuration Parsing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Parses YOLO v3 Darknet configuration files and dynamically builds the corresponding neural network model, including convolutional layers, routing, shortcuts, upsampling, and YOLO detection heads.
Description
This module implements a complete Darknet config parser and model builder for YOLO v3. The parse_config function reads a .cfg file line by line, accumulating blocks delimited by [block_type] headers. Each block contains key-value parameters. The [net] block provides global parameters (image height/width), while other blocks define network layers.
The Darknet struct holds parsed blocks and exposes build_model which iterates through blocks and constructs the corresponding layers:
- [convolutional] - Creates a Conv2D layer with optional batch normalization and leaky ReLU activation (using
xs.maximum(&(xs * 0.1))). Supports configurable filters, kernel size, stride, and padding. - [upsample] - Applies 2x nearest-neighbor upsampling via
upsample_nearest2d. - [route] - Concatenates outputs from specified previous layers along the channel dimension.
- [shortcut] - Adds the output of a previous layer to the current layer (residual connection).
- [yolo] - Defines a detection head with anchor boxes and class count; applies the detect function.
The detect function transforms raw YOLO predictions into bounding box format by applying sigmoid to x/y coordinates and objectness/class scores, exponentiating width/height predictions scaled by anchor dimensions, and offsetting coordinates by grid cell positions. All coordinates are scaled to the original image dimensions.
The model is returned as a FuncT closure that processes input images through all layers sequentially, maintaining a list of intermediate outputs for route and shortcut references, and concatenating all YOLO detection outputs.
Usage
Use this module to load and instantiate YOLO v3 models from standard Darknet configuration files. The built model accepts image tensors and returns concatenated detection predictions across all three YOLO scales.
Code Reference
Source Location
- Repository: LaurentMazare_Tch_rs
- File: examples/yolo/darknet.rs
- Lines: 1-286
Signature
#[derive(Debug)]
pub struct Darknet {
blocks: Vec<Block>,
parameters: BTreeMap<String, String>,
}
impl Darknet {
pub fn height(&self) -> Result<i64>
pub fn width(&self) -> Result<i64>
pub fn build_model(&self, vs: &nn::Path) -> Result<FuncT<'_>>
}
pub fn parse_config<T: AsRef<Path>>(path: T) -> Result<Darknet>
// Internal block types
enum Bl {
Layer(Box<dyn ModuleT>),
Route(Vec<usize>),
Shortcut(usize),
Yolo(i64, Vec<(i64, i64)>),
}
// Internal layer builders
fn conv(vs: nn::Path, index: usize, p: i64, b: &Block) -> Result<(i64, Bl)>
fn upsample(prev_channels: i64) -> Result<(i64, Bl)>
fn route(index: usize, p: &[(i64, Bl)], block: &Block) -> Result<(i64, Bl)>
fn shortcut(index: usize, p: i64, block: &Block) -> Result<(i64, Bl)>
fn yolo(p: i64, block: &Block) -> Result<(i64, Bl)>
fn detect(xs: &Tensor, image_height: i64, classes: i64, anchors: &[(i64, i64)]) -> Tensor
Import
use anyhow::{bail, ensure, Result};
use std::collections::BTreeMap;
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use tch::{nn, nn::{FuncT, ModuleT}, Tensor};
I/O Contract
| Input | Type | Description |
|---|---|---|
| path (parse_config) | AsRef<Path> | Path to the Darknet .cfg configuration file |
| vs (build_model) | &nn::Path | Variable store path for creating network parameters |
| xs (model forward) | &Tensor | Input image tensor, shape [batch, 3, height, width] |
| Output | Type | Description |
|---|---|---|
| parse_config | Darknet | Parsed configuration with blocks and global parameters |
| height() / width() | i64 | Network input image dimensions from [net] block |
| build_model | FuncT | Callable model that returns concatenated detection tensors |
| Model output | Tensor | Shape [batch, num_predictions, 5 + num_classes] with (x, y, w, h, objectness, class_scores...) |
| Supported Block Types | Parameters | Description |
|---|---|---|
| convolutional | filters, size, stride, pad, batch_normalize, activation | Conv2D with optional BN and leaky ReLU |
| upsample | (none used) | 2x nearest neighbor upsampling |
| route | layers | Concatenate outputs from specified layer indices |
| shortcut | from | Add output from a previous layer (residual) |
| yolo | classes, anchors, mask | Detection head with anchor boxes |
Usage Examples
use darknet::{parse_config, Darknet};
// Parse the YOLO v3 configuration
let darknet = parse_config("examples/yolo/yolo-v3.cfg")?;
let net_height = darknet.height()?;
let net_width = darknet.width()?;
// Build the model
let vs = nn::VarStore::new(tch::Device::Cpu);
let model = darknet.build_model(&vs.root())?;
// Load pre-trained weights
vs.load("yolo-v3.ot")?;
// Run inference
let image = tch::vision::image::load("photo.jpg")?;
let resized = tch::vision::image::resize(&image, net_width, net_height)?;
let input = resized.unsqueeze(0).to_kind(tch::Kind::Float) / 255.;
let predictions = model.forward_t(&input, false).squeeze();
// predictions shape: [num_detections, 85] for COCO (80 classes + 5)