Implementation:Vllm project Vllm CPU Types Scalar
| Knowledge Sources | |
|---|---|
| Domains | CPU_Inference, Portability |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Provides scalar (non-SIMD) fallback implementations of vector types for platforms without hardware SIMD support, maintaining API compatibility with the vectorized backends.
Description
This header implements the same vector type interfaces (FP16Vec8, FP16Vec16, BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16) as the SIMD-optimized headers but uses plain C arrays and scalar loop-based arithmetic. Each struct stores its elements in fixed-size arrays (e.g., f16x8_t, f32x16_t) and provides load, save, and arithmetic operations via element-wise scalar loops with compile-time unrolling. It includes software float16/bfloat16 conversion routines from float_convert.hpp.
Usage
This header is selected at compile time when no supported SIMD instruction set is available on the target platform. It ensures that vLLM can compile and run on any CPU architecture, albeit with reduced performance compared to SIMD-optimized paths.
Code Reference
Source Location
- Repository: vllm
- File: csrc/cpu/cpu_types_scalar.hpp
- Lines: 1-465
Signature
namespace vec_op {
struct FP16Vec8 : public Vec<FP16Vec8> {
constexpr static int VEC_ELEM_NUM = 8;
f16x8_t reg;
explicit FP16Vec8(const void* ptr);
explicit FP16Vec8(const FP32Vec8&);
void save(void* ptr) const;
};
struct BF16Vec16 : public Vec<BF16Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
f16x16_t reg;
explicit BF16Vec16(const void* ptr);
explicit BF16Vec16(const FP32Vec16&);
void save(void* ptr) const;
void save(void* ptr, const int elem_num) const;
};
struct FP32Vec16 : public Vec<FP32Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
f32x16_t reg;
explicit FP32Vec16(const void* ptr);
explicit FP32Vec16(float v);
FP32Vec16 operator*(const FP32Vec16&) const;
FP32Vec16 operator+(const FP32Vec16&) const;
FP32Vec16 operator-(const FP32Vec16&) const;
};
} // namespace vec_op
Import
#include "cpu/cpu_types_scalar.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ptr | const void* | Yes | Pointer to source data for vector load operations |
| v | float | No | Scalar value to broadcast across all vector elements |
| elem_num | int | No | Number of elements for partial save operations |
Outputs
| Name | Type | Description |
|---|---|---|
| Vector struct | FP32Vec16, BF16Vec16, etc. | Scalar-emulated vector containing computed elements in a fixed-size array |
Usage Examples
// Load 16 floats from memory using scalar fallback
vec_op::FP32Vec16 vec(input_ptr);
// Scalar element-wise multiply
vec_op::FP32Vec16 result = vec * scale_vec;
// Convert FP32 to BF16 (scalar conversion) and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);