Heuristic:Dotnet Machinelearning LDA Thread Reservation
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Threading |
| Last Updated | 2026-02-09 11:00 GMT |
Overview
Thread count heuristic for LDA training: use hardware_concurrency() minus 2 to reserve cores for system responsiveness, and serialize initialization due to non-thread-safe startup.
Description
The LDA (Latent Dirichlet Allocation) native engine uses a threading model that automatically selects the number of worker threads based on available CPU cores. It reserves 2 cores for the operating system and other processes to prevent system degradation during long training runs. Additionally, the LDA initialization phase (`InitializeBeforeTest`) is NOT thread-safe and must be serialized.
Usage
Use this heuristic when configuring LDA topic model training or running LDA inference in multi-threaded scenarios. Apply the "reserve 2 cores" rule to any long-running CPU-bound training task to maintain system responsiveness. Be aware of the thread-safety limitation during initialization.
The Insight (Rule of Thumb)
- Action: Default thread count = `max(1, hardware_concurrency() - 2)`.
- Value: Reserve 2 CPU cores for system tasks.
- Trade-off: Slightly reduced parallelism (~10-15% on 16-core machine) in exchange for system remaining responsive during training.
- Warning: LDA `InitializeBeforeTest()` is NOT thread-safe. Serialize all initialization calls before parallel inference.
Reasoning
Long-running training tasks that consume all CPU cores cause system unresponsiveness (UI lag, dropped connections, slow I/O). Reserving 2 cores ensures the OS scheduler has resources for system processes. On a 16-core machine, using 14 cores still provides ~87% utilization while keeping the system usable. The `max(1, ...)` guard ensures at least one worker thread even on dual-core systems.
The thread-safety limitation of `InitializeBeforeTest()` is a known design constraint in the native C++ implementation where global state is modified during initialization.
Code Evidence
Thread reservation from `src/Native/LdaNative/lda_engine.cpp:58-66`:
if (numThread > 0) {
num_threads_ = numThread;
} else {
unsigned int uNumCPU = std::thread::hardware_concurrency();
num_threads_ = std::max(1, (int)(uNumCPU - 2)); // Reserve 2 cores for system
}
Thread-safety warning from `src/Microsoft.ML.Transforms/Text/LdaTransform.cs:468`:
// LdaSingleBox.InitializeBeforeTest() is NOT thread-safe.
Default thread count note from `src/Microsoft.ML.Transforms/Text/LdaTransform.cs:87`:
// REVIEW: Should change the default when multi-threading support is optimized.