Principle:Tencent Ncnn Hardware Capability Detection
| Knowledge Sources | |
|---|---|
| Domains | CPU_Architecture, Runtime_Optimization |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Runtime detection of CPU instruction set architecture extensions and hardware topology to select the most efficient compute kernel for the current platform.
Description
Hardware capability detection is the practice of querying the processor at runtime to determine which SIMD instruction set extensions, cache hierarchy parameters, and core topology (big.LITTLE, heterogeneous clusters) are available. This information drives kernel selection: the inference engine chooses between scalar, NEON, SVE, SSE, AVX2, AVX-512, RISC-V Vector, LoongArch LSX/LASX, or MIPS MSA code paths based on what the running CPU actually supports, rather than relying solely on compile-time flags.
Detection can be performed through OS-provided interfaces (reading /proc/cpuinfo on Linux, CPUID instruction on x86, system registers on ARM) or through a more portable signal-based probing technique: attempt to execute a specific instruction inside a guarded context (signal handler or structured exception handler), and if an illegal-instruction trap fires, conclude the extension is absent. This signal-based approach works even on platforms that lack OS-level feature reporting and across all major operating systems.
Beyond ISA detection, the system also queries physical core counts, cache sizes (L2/L3), and big/little cluster membership so that thread affinity and workload partitioning can be tuned at runtime.
Usage
Apply this principle at framework initialization time, before any inference computation begins. The detected capabilities are cached and used throughout the session to dispatch to optimized kernels. It is also the foundation for benchmarking tools that report what hardware features are active during performance measurement.
Theoretical Basis
Signal-based ISA probing (ruapu technique):
// Attempt to execute a candidate instruction in a guarded context
// If SIGILL is raised, the instruction is not supported
static int ruapu_detect_isa(ruapu_some_inst some_inst)
{
g_ruapu_sigill_caught = 0;
if (sigsetjmp(g_ruapu_jmpbuf, 1) == 0)
{
some_inst(); // execute candidate instruction
}
return g_ruapu_sigill_caught ? 0 : 1;
}
Feature query API pattern:
// Each ISA extension has a dedicated boolean query
int cpu_support_arm_neon(); // ARM NEON / AArch64 ASIMD
int cpu_support_arm_asimdhp(); // AArch64 half-precision
int cpu_support_arm_asimddp(); // AArch64 dot-product
int cpu_support_arm_sve(); // AArch64 SVE
int cpu_support_x86_avx2(); // x86 AVX2 + FMA + F16C
int cpu_support_x86_avx512(); // x86 AVX-512
int cpu_support_riscv_v(); // RISC-V Vector extension
Topology queries:
get_cpu_count() -> total logical cores
get_big_cpu_count() -> big cluster core count
get_little_cpu_count() -> little cluster core count
get_cpu_level2_cache_size() -> L2 cache size in bytes
get_cpu_level3_cache_size() -> L3 cache size in bytes
set_cpu_powersave(mode) -> bind threads to big or little cluster