Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Tencent Ncnn Lightmode Memory Optimization

From Leeroopedia



Knowledge Sources
Domains Optimization, Memory_Management
Last Updated 2026-02-09 19:00 GMT

Overview

Memory optimization technique using ncnn's lightmode to automatically recycle intermediate blob memory during inference, reducing peak memory usage by 2-3x.

Description

ncnn's lightmode controls whether intermediate activation blobs are kept in memory after they are consumed by downstream layers. When enabled (the default), blobs are reference-counted and automatically recycled as soon as all consumers have read them. This is possible because ncnn uses depth-first execution with top-down dependency resolution. The Split layer is implemented as zero-copy (just reference count increment), making parallel branches efficient. When lightmode is disabled, all intermediate results are retained, which is useful for debugging but consumes 2-3x more memory.

Usage

Use this heuristic when deploying ncnn on memory-constrained devices (mobile phones, embedded systems). Keep lightmode enabled (the default) for production. Only disable it when you need to inspect intermediate layer outputs for debugging or verification purposes. Also understand that the Extractor is stateful: always create a new Extractor for each inference to avoid returning cached results.

The Insight (Rule of Thumb)

  • Action: Keep `net.opt.lightmode = true` (default). Create a new `Extractor` for every inference call.
  • Value: Lightmode is enabled by default. No action needed unless debugging.
  • Trade-off: Lightmode ON saves 2-3x memory but intermediate blobs are not inspectable after consumption. Lightmode OFF retains all blobs for debugging but uses much more memory.
  • Anti-pattern: Never reuse an Extractor for multiple images. It caches results and will return stale data. Net is the model (load once, share globally); Extractor is the computation instance (create per inference).

Reasoning

Deep neural networks produce large intermediate activation maps at each layer. For a typical model with 50+ layers, keeping all activations in memory simultaneously requires enormous RAM. Since ncnn executes layers sequentially in dependency order, each intermediate blob can be freed as soon as all downstream consumers have read it. The reference counting mechanism (using thread-safe atomic operations) tracks this automatically. The Split layer, which fans out one blob to multiple branches, is implemented as a pure reference count increment with no data copy, making the system both memory-efficient and fast.

Code evidence from `src/option.cpp:12`:

lightmode = true;

Architecture insight: blob memory uses reference counting with 16-byte aligned channels. The `a = b` assignment on Mat objects increments the reference count without copying data.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment