Implementation:Tencent Ncnn Cpu Feature Detection

Knowledge Sources	Tencent_Ncnn
Domains	CPU Architecture, Hardware Detection, Thread Management
Last Updated	2026-02-09 19:00 GMT

Overview

Cross-platform CPU feature detection, topology discovery, cache sizing, and thread affinity management for selecting optimized SIMD code paths at runtime.

Description

The CPU feature detection module is one of the largest core files in ncnn (3276 lines), responsible for probing hardware capabilities across every supported architecture. It provides three major categories of functionality:

ISA Feature Detection: The module detects instruction set extensions at runtime using platform-specific mechanisms:

On x86/x86_64, it uses cpuid intrinsics (__get_cpuid / __cpuid) to detect SSE through AVX-512 features
On ARM Linux/Android, it reads getauxval(AT_HWCAP) and parses /proc/cpuinfo for NEON, ASIMD half-precision, SVE, BF16, I8MM, and other extensions
On Apple platforms (macOS/iOS), it uses sysctlbyname queries and CPU family identification constants (covering A10 through A18 Pro and M1 through M4)
On RISC-V, it uses the ruapu probing library and hwprobe syscall for V extension, Zvfh, and T-Head vendor extensions
On LoongArch and MIPS, it detects LSX/LASX and MSA respectively

CPU Topology Discovery: The module determines the physical layout of CPU cores:

Total, big, and little core counts (for ARM big.LITTLE heterogeneous architectures)
Physical versus logical core counts
L2 and L3 cache sizes (read from sysfs on Linux, sysctlbyname on Apple)

Thread Affinity Management: The module controls which CPU cores threads execute on:

The CpuSet class provides a cross-platform CPU affinity mask (wrapping cpu_set_t on Linux, ULONG_PTR on Windows, Mach thread policy on macOS)
set_cpu_powersave() binds threads to little or big clusters for power efficiency
set_cpu_thread_affinity() applies explicit affinity masks
OpenMP wrapper functions manage thread count and KMP blocktime settings
A startup initializer disables KMP_AFFINITY to prevent crashes on Android

Usage

Use this module whenever you need to query the CPU capabilities of the host platform at runtime, set thread affinity for optimal performance on heterogeneous processors, or configure the number of inference threads. The layer factory system in ncnn relies on this module to select the best SIMD-optimized kernel implementation.

Code Reference

Source Location

Repository: Tencent_Ncnn
Header: src/cpu.h (177 lines)
Implementation: src/cpu.cpp (3276 lines)

Signature

namespace ncnn {

class NCNN_EXPORT CpuSet
{
public:
    CpuSet();
    void enable(int cpu);
    void disable(int cpu);
    void disable_all();
    bool is_enabled(int cpu) const;
    int num_enabled() const;
};

// ARM ISA feature detection
NCNN_EXPORT int cpu_support_arm_neon();
NCNN_EXPORT int cpu_support_arm_asimdhp();
NCNN_EXPORT int cpu_support_arm_asimddp();
NCNN_EXPORT int cpu_support_arm_bf16();
NCNN_EXPORT int cpu_support_arm_i8mm();
NCNN_EXPORT int cpu_support_arm_sve();
NCNN_EXPORT int cpu_support_arm_sve2();

// x86 ISA feature detection
NCNN_EXPORT int cpu_support_x86_avx();
NCNN_EXPORT int cpu_support_x86_fma();
NCNN_EXPORT int cpu_support_x86_avx2();
NCNN_EXPORT int cpu_support_x86_avx512();
NCNN_EXPORT int cpu_support_x86_avx512_vnni();
NCNN_EXPORT int cpu_support_x86_avx512_bf16();
NCNN_EXPORT int cpu_support_x86_avx512_fp16();

// RISC-V ISA feature detection
NCNN_EXPORT int cpu_support_riscv_v();
NCNN_EXPORT int cpu_support_riscv_zvfh();
NCNN_EXPORT int cpu_riscv_vlenb();

// CPU topology
NCNN_EXPORT int get_cpu_count();
NCNN_EXPORT int get_little_cpu_count();
NCNN_EXPORT int get_big_cpu_count();
NCNN_EXPORT int get_physical_cpu_count();
NCNN_EXPORT int get_cpu_level2_cache_size();
NCNN_EXPORT int get_cpu_level3_cache_size();

// Thread affinity and powersave
NCNN_EXPORT int get_cpu_powersave();
NCNN_EXPORT int set_cpu_powersave(int powersave);
NCNN_EXPORT const CpuSet& get_cpu_thread_affinity_mask(int powersave);
NCNN_EXPORT int set_cpu_thread_affinity(const CpuSet& thread_affinity_mask);

// OpenMP wrappers
NCNN_EXPORT int get_omp_num_threads();
NCNN_EXPORT void set_omp_num_threads(int num_threads);

// Flush denormals (x86)
NCNN_EXPORT int get_flush_denormals();
NCNN_EXPORT int set_flush_denormals(int flush_denormals);

} // namespace ncnn

Import

#include "ncnn/cpu.h"

I/O Contract

Inputs

Name	Type	Required	Description
powersave	`int`	No	Powersave mode: 0 = all cores (default), 1 = little cores only, 2 = big cores only
thread_affinity_mask	`const CpuSet&`	No	Explicit CPU affinity mask to bind threads to specific cores
num_threads	`int`	No	Number of OpenMP threads for parallel inference
flush_denormals	`int`	No	Denormal flush mode: 0=off, 1=DAZ, 2=FTZ, 3=DAZ+FTZ

Outputs

Name	Type	Description
feature_supported	`int`	1 if the ISA feature is supported, 0 otherwise
cpu_count	`int`	Number of CPU cores (total, big, little, or physical)
cache_size	`int`	Cache size in bytes for L2 or L3
return code	`int`	0 on success for setter functions

Usage Examples

Querying CPU Features

#include "ncnn/cpu.h"
#include <stdio.h>

void print_cpu_info()
{
    printf("CPU count: %d (big: %d, little: %d)\n",
           ncnn::get_cpu_count(),
           ncnn::get_big_cpu_count(),
           ncnn::get_little_cpu_count());

    printf("L2 cache: %d KB\n", ncnn::get_cpu_level2_cache_size() / 1024);

    if (ncnn::cpu_support_x86_avx2())
        printf("AVX2 supported\n");
    if (ncnn::cpu_support_x86_avx512())
        printf("AVX-512 supported\n");
    if (ncnn::cpu_support_arm_neon())
        printf("ARM NEON supported\n");
    if (ncnn::cpu_support_arm_sve())
        printf("ARM SVE supported\n");
}

Configuring Thread Affinity for big.LITTLE

#include "ncnn/cpu.h"

void setup_for_performance()
{
    // Use only big cores for maximum performance
    ncnn::set_cpu_powersave(2);

    // Or set explicit thread affinity
    const ncnn::CpuSet& big_mask = ncnn::get_cpu_thread_affinity_mask(2);
    ncnn::set_cpu_thread_affinity(big_mask);

    // Set thread count to match big core count
    ncnn::set_omp_num_threads(ncnn::get_big_cpu_count());
}

void setup_for_power_saving()
{
    // Use only little cores for power efficiency
    ncnn::set_cpu_powersave(1);
    ncnn::set_omp_num_threads(ncnn::get_little_cpu_count());
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment