Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Dotnet Machinelearning LDA Thread Reservation

From Leeroopedia



Knowledge Sources
Domains Optimization, Threading
Last Updated 2026-02-09 11:00 GMT

Overview

Thread count heuristic for LDA training: use hardware_concurrency() minus 2 to reserve cores for system responsiveness, and serialize initialization due to non-thread-safe startup.

Description

The LDA (Latent Dirichlet Allocation) native engine uses a threading model that automatically selects the number of worker threads based on available CPU cores. It reserves 2 cores for the operating system and other processes to prevent system degradation during long training runs. Additionally, the LDA initialization phase (`InitializeBeforeTest`) is NOT thread-safe and must be serialized.

Usage

Use this heuristic when configuring LDA topic model training or running LDA inference in multi-threaded scenarios. Apply the "reserve 2 cores" rule to any long-running CPU-bound training task to maintain system responsiveness. Be aware of the thread-safety limitation during initialization.

The Insight (Rule of Thumb)

  • Action: Default thread count = `max(1, hardware_concurrency() - 2)`.
  • Value: Reserve 2 CPU cores for system tasks.
  • Trade-off: Slightly reduced parallelism (~10-15% on 16-core machine) in exchange for system remaining responsive during training.
  • Warning: LDA `InitializeBeforeTest()` is NOT thread-safe. Serialize all initialization calls before parallel inference.

Reasoning

Long-running training tasks that consume all CPU cores cause system unresponsiveness (UI lag, dropped connections, slow I/O). Reserving 2 cores ensures the OS scheduler has resources for system processes. On a 16-core machine, using 14 cores still provides ~87% utilization while keeping the system usable. The `max(1, ...)` guard ensures at least one worker thread even on dual-core systems.

The thread-safety limitation of `InitializeBeforeTest()` is a known design constraint in the native C++ implementation where global state is modified during initialization.

Code Evidence

Thread reservation from `src/Native/LdaNative/lda_engine.cpp:58-66`:

if (numThread > 0) {
    num_threads_ = numThread;
} else {
    unsigned int uNumCPU = std::thread::hardware_concurrency();
    num_threads_ = std::max(1, (int)(uNumCPU - 2));  // Reserve 2 cores for system
}

Thread-safety warning from `src/Microsoft.ML.Transforms/Text/LdaTransform.cs:468`:

// LdaSingleBox.InitializeBeforeTest() is NOT thread-safe.

Default thread count note from `src/Microsoft.ML.Transforms/Text/LdaTransform.cs:87`:

// REVIEW: Should change the default when multi-threading support is optimized.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment