Implementation:Tencent Ncnn Cpu Feature Detection
| Knowledge Sources | |
|---|---|
| Domains | CPU Architecture, Hardware Detection, Thread Management |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Cross-platform CPU feature detection, topology discovery, cache sizing, and thread affinity management for selecting optimized SIMD code paths at runtime.
Description
The CPU feature detection module is one of the largest core files in ncnn (3276 lines), responsible for probing hardware capabilities across every supported architecture. It provides three major categories of functionality:
ISA Feature Detection: The module detects instruction set extensions at runtime using platform-specific mechanisms:
- On x86/x86_64, it uses
cpuidintrinsics (__get_cpuid/__cpuid) to detect SSE through AVX-512 features - On ARM Linux/Android, it reads
getauxval(AT_HWCAP)and parses/proc/cpuinfofor NEON, ASIMD half-precision, SVE, BF16, I8MM, and other extensions - On Apple platforms (macOS/iOS), it uses
sysctlbynamequeries and CPU family identification constants (covering A10 through A18 Pro and M1 through M4) - On RISC-V, it uses the ruapu probing library and hwprobe syscall for V extension, Zvfh, and T-Head vendor extensions
- On LoongArch and MIPS, it detects LSX/LASX and MSA respectively
CPU Topology Discovery: The module determines the physical layout of CPU cores:
- Total, big, and little core counts (for ARM big.LITTLE heterogeneous architectures)
- Physical versus logical core counts
- L2 and L3 cache sizes (read from sysfs on Linux,
sysctlbynameon Apple)
Thread Affinity Management: The module controls which CPU cores threads execute on:
- The
CpuSetclass provides a cross-platform CPU affinity mask (wrappingcpu_set_ton Linux,ULONG_PTRon Windows, Mach thread policy on macOS) set_cpu_powersave()binds threads to little or big clusters for power efficiencyset_cpu_thread_affinity()applies explicit affinity masks- OpenMP wrapper functions manage thread count and KMP blocktime settings
- A startup initializer disables
KMP_AFFINITYto prevent crashes on Android
Usage
Use this module whenever you need to query the CPU capabilities of the host platform at runtime, set thread affinity for optimal performance on heterogeneous processors, or configure the number of inference threads. The layer factory system in ncnn relies on this module to select the best SIMD-optimized kernel implementation.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- Header: src/cpu.h (177 lines)
- Implementation: src/cpu.cpp (3276 lines)
Signature
namespace ncnn {
class NCNN_EXPORT CpuSet
{
public:
CpuSet();
void enable(int cpu);
void disable(int cpu);
void disable_all();
bool is_enabled(int cpu) const;
int num_enabled() const;
};
// ARM ISA feature detection
NCNN_EXPORT int cpu_support_arm_neon();
NCNN_EXPORT int cpu_support_arm_asimdhp();
NCNN_EXPORT int cpu_support_arm_asimddp();
NCNN_EXPORT int cpu_support_arm_bf16();
NCNN_EXPORT int cpu_support_arm_i8mm();
NCNN_EXPORT int cpu_support_arm_sve();
NCNN_EXPORT int cpu_support_arm_sve2();
// x86 ISA feature detection
NCNN_EXPORT int cpu_support_x86_avx();
NCNN_EXPORT int cpu_support_x86_fma();
NCNN_EXPORT int cpu_support_x86_avx2();
NCNN_EXPORT int cpu_support_x86_avx512();
NCNN_EXPORT int cpu_support_x86_avx512_vnni();
NCNN_EXPORT int cpu_support_x86_avx512_bf16();
NCNN_EXPORT int cpu_support_x86_avx512_fp16();
// RISC-V ISA feature detection
NCNN_EXPORT int cpu_support_riscv_v();
NCNN_EXPORT int cpu_support_riscv_zvfh();
NCNN_EXPORT int cpu_riscv_vlenb();
// CPU topology
NCNN_EXPORT int get_cpu_count();
NCNN_EXPORT int get_little_cpu_count();
NCNN_EXPORT int get_big_cpu_count();
NCNN_EXPORT int get_physical_cpu_count();
NCNN_EXPORT int get_cpu_level2_cache_size();
NCNN_EXPORT int get_cpu_level3_cache_size();
// Thread affinity and powersave
NCNN_EXPORT int get_cpu_powersave();
NCNN_EXPORT int set_cpu_powersave(int powersave);
NCNN_EXPORT const CpuSet& get_cpu_thread_affinity_mask(int powersave);
NCNN_EXPORT int set_cpu_thread_affinity(const CpuSet& thread_affinity_mask);
// OpenMP wrappers
NCNN_EXPORT int get_omp_num_threads();
NCNN_EXPORT void set_omp_num_threads(int num_threads);
// Flush denormals (x86)
NCNN_EXPORT int get_flush_denormals();
NCNN_EXPORT int set_flush_denormals(int flush_denormals);
} // namespace ncnn
Import
#include "ncnn/cpu.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| powersave | int |
No | Powersave mode: 0 = all cores (default), 1 = little cores only, 2 = big cores only |
| thread_affinity_mask | const CpuSet& |
No | Explicit CPU affinity mask to bind threads to specific cores |
| num_threads | int |
No | Number of OpenMP threads for parallel inference |
| flush_denormals | int |
No | Denormal flush mode: 0=off, 1=DAZ, 2=FTZ, 3=DAZ+FTZ |
Outputs
| Name | Type | Description |
|---|---|---|
| feature_supported | int |
1 if the ISA feature is supported, 0 otherwise |
| cpu_count | int |
Number of CPU cores (total, big, little, or physical) |
| cache_size | int |
Cache size in bytes for L2 or L3 |
| return code | int |
0 on success for setter functions |
Usage Examples
Querying CPU Features
#include "ncnn/cpu.h"
#include <stdio.h>
void print_cpu_info()
{
printf("CPU count: %d (big: %d, little: %d)\n",
ncnn::get_cpu_count(),
ncnn::get_big_cpu_count(),
ncnn::get_little_cpu_count());
printf("L2 cache: %d KB\n", ncnn::get_cpu_level2_cache_size() / 1024);
if (ncnn::cpu_support_x86_avx2())
printf("AVX2 supported\n");
if (ncnn::cpu_support_x86_avx512())
printf("AVX-512 supported\n");
if (ncnn::cpu_support_arm_neon())
printf("ARM NEON supported\n");
if (ncnn::cpu_support_arm_sve())
printf("ARM SVE supported\n");
}
Configuring Thread Affinity for big.LITTLE
#include "ncnn/cpu.h"
void setup_for_performance()
{
// Use only big cores for maximum performance
ncnn::set_cpu_powersave(2);
// Or set explicit thread affinity
const ncnn::CpuSet& big_mask = ncnn::get_cpu_thread_affinity_mask(2);
ncnn::set_cpu_thread_affinity(big_mask);
// Set thread count to match big core count
ncnn::set_omp_num_threads(ncnn::get_big_cpu_count());
}
void setup_for_power_saving()
{
// Use only little cores for power efficiency
ncnn::set_cpu_powersave(1);
ncnn::set_omp_num_threads(ncnn::get_little_cpu_count());
}