Implementation:Vllm project Vllm CPU Utils

Knowledge Sources	vllm
Domains	NUMA, CPU_Inference, Thread_Management
Last Updated	2026-02-08 00:00 GMT

Overview

Implements CPU thread affinity binding and NUMA-aware memory management for optimizing multi-threaded inference performance on multi-socket CPU systems.

Description

This file provides the init_cpu_threads_env function that configures thread-to-core affinity and NUMA memory policies for the vLLM CPU backend. It parses a CPU ID string to determine which cores to bind to, migrates existing memory pages to the appropriate NUMA nodes, and sets memory allocation policies (MEMBIND for single-node, INTERLEAVE for multi-node). Additionally, it includes the ScratchPadManager class for managing aligned temporary memory allocations used by CPU kernels.

Usage

The init_cpu_threads_env function is called during CPU backend initialization from Python to configure optimal thread placement. When NUMA is disabled at build time (VLLM_NUMA_DISABLED), the function returns a warning string without performing any binding. The ScratchPadManager provides a singleton scratch buffer for temporary kernel computations.

Code Reference

Source Location

Repository: vllm
File: csrc/cpu/utils.cpp
Lines: 1-188

Signature

std::string init_cpu_threads_env(const std::string& cpu_ids);

namespace cpu_utils {
class ScratchPadManager {
public:
    ScratchPadManager();
    void realloc(size_t new_size);
    static ScratchPadManager* get_scratchpad_manager();
private:
    size_t size_;
    void* ptr_;
};
}  // namespace cpu_utils

Import

#include "cpu/utils.hpp"
#include <numa.h>       // when NUMA is enabled
#include <sched.h>      // for sched_setaffinity

I/O Contract

Inputs

Name	Type	Required	Description
cpu_ids	std::string	Yes	Comma-separated CPU core IDs or ranges (e.g., "0-3,8-11") parsed by numa_parse_cpustring_all

Outputs

Name	Type	Description
return	std::string	Diagnostic string describing OMP thread-to-core binding (process ID, thread ID, core mappings)

Usage Examples

// Bind OMP threads to specific CPU cores on NUMA node 0
std::string result = init_cpu_threads_env("0-7");
// Output: "OMP threads binding of Process 12345:\n\tOMP tid: 12346, core 0\n..."

// Multi-NUMA node binding with interleave policy
std::string result = init_cpu_threads_env("0-3,16-19");
// Memory allocation will be interleaved across NUMA nodes containing those cores

// ScratchPad usage in CPU kernels
auto* manager = cpu_utils::ScratchPadManager::get_scratchpad_manager();
manager->realloc(1024 * 1024);  // Ensure at least 1MB scratch space

Related Pages

Environment:Vllm_project_Vllm_CPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment