Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Tencent Ncnn Hardware Capability Detection

From Leeroopedia


Knowledge Sources
Domains CPU_Architecture, Runtime_Optimization
Last Updated 2026-02-09 19:00 GMT

Overview

Runtime detection of CPU instruction set architecture extensions and hardware topology to select the most efficient compute kernel for the current platform.

Description

Hardware capability detection is the practice of querying the processor at runtime to determine which SIMD instruction set extensions, cache hierarchy parameters, and core topology (big.LITTLE, heterogeneous clusters) are available. This information drives kernel selection: the inference engine chooses between scalar, NEON, SVE, SSE, AVX2, AVX-512, RISC-V Vector, LoongArch LSX/LASX, or MIPS MSA code paths based on what the running CPU actually supports, rather than relying solely on compile-time flags.

Detection can be performed through OS-provided interfaces (reading /proc/cpuinfo on Linux, CPUID instruction on x86, system registers on ARM) or through a more portable signal-based probing technique: attempt to execute a specific instruction inside a guarded context (signal handler or structured exception handler), and if an illegal-instruction trap fires, conclude the extension is absent. This signal-based approach works even on platforms that lack OS-level feature reporting and across all major operating systems.

Beyond ISA detection, the system also queries physical core counts, cache sizes (L2/L3), and big/little cluster membership so that thread affinity and workload partitioning can be tuned at runtime.

Usage

Apply this principle at framework initialization time, before any inference computation begins. The detected capabilities are cached and used throughout the session to dispatch to optimized kernels. It is also the foundation for benchmarking tools that report what hardware features are active during performance measurement.

Theoretical Basis

Signal-based ISA probing (ruapu technique):

// Attempt to execute a candidate instruction in a guarded context
// If SIGILL is raised, the instruction is not supported
static int ruapu_detect_isa(ruapu_some_inst some_inst)
{
    g_ruapu_sigill_caught = 0;
    if (sigsetjmp(g_ruapu_jmpbuf, 1) == 0)
    {
        some_inst();  // execute candidate instruction
    }
    return g_ruapu_sigill_caught ? 0 : 1;
}

Feature query API pattern:

// Each ISA extension has a dedicated boolean query
int cpu_support_arm_neon();      // ARM NEON / AArch64 ASIMD
int cpu_support_arm_asimdhp();   // AArch64 half-precision
int cpu_support_arm_asimddp();   // AArch64 dot-product
int cpu_support_arm_sve();       // AArch64 SVE
int cpu_support_x86_avx2();      // x86 AVX2 + FMA + F16C
int cpu_support_x86_avx512();    // x86 AVX-512
int cpu_support_riscv_v();       // RISC-V Vector extension

Topology queries:

get_cpu_count()                  -> total logical cores
get_big_cpu_count()              -> big cluster core count
get_little_cpu_count()           -> little cluster core count
get_cpu_level2_cache_size()      -> L2 cache size in bytes
get_cpu_level3_cache_size()      -> L3 cache size in bytes
set_cpu_powersave(mode)          -> bind threads to big or little cluster

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment