Implementation:Vllm project Vllm CPU Types Scalar

Knowledge Sources	vllm
Domains	CPU_Inference, Portability
Last Updated	2026-02-08 00:00 GMT

Overview

Provides scalar (non-SIMD) fallback implementations of vector types for platforms without hardware SIMD support, maintaining API compatibility with the vectorized backends.

Description

This header implements the same vector type interfaces (FP16Vec8, FP16Vec16, BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16) as the SIMD-optimized headers but uses plain C arrays and scalar loop-based arithmetic. Each struct stores its elements in fixed-size arrays (e.g., f16x8_t, f32x16_t) and provides load, save, and arithmetic operations via element-wise scalar loops with compile-time unrolling. It includes software float16/bfloat16 conversion routines from float_convert.hpp.

Usage

This header is selected at compile time when no supported SIMD instruction set is available on the target platform. It ensures that vLLM can compile and run on any CPU architecture, albeit with reduced performance compared to SIMD-optimized paths.

Code Reference

Source Location

Repository: vllm
File: csrc/cpu/cpu_types_scalar.hpp
Lines: 1-465

Signature

namespace vec_op {

struct FP16Vec8 : public Vec<FP16Vec8> {
    constexpr static int VEC_ELEM_NUM = 8;
    f16x8_t reg;
    explicit FP16Vec8(const void* ptr);
    explicit FP16Vec8(const FP32Vec8&);
    void save(void* ptr) const;
};

struct BF16Vec16 : public Vec<BF16Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f16x16_t reg;
    explicit BF16Vec16(const void* ptr);
    explicit BF16Vec16(const FP32Vec16&);
    void save(void* ptr) const;
    void save(void* ptr, const int elem_num) const;
};

struct FP32Vec16 : public Vec<FP32Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f32x16_t reg;
    explicit FP32Vec16(const void* ptr);
    explicit FP32Vec16(float v);
    FP32Vec16 operator*(const FP32Vec16&) const;
    FP32Vec16 operator+(const FP32Vec16&) const;
    FP32Vec16 operator-(const FP32Vec16&) const;
};

} // namespace vec_op

Import

#include "cpu/cpu_types_scalar.hpp"

I/O Contract

Inputs

Name	Type	Required	Description
ptr	const void*	Yes	Pointer to source data for vector load operations
v	float	No	Scalar value to broadcast across all vector elements
elem_num	int	No	Number of elements for partial save operations

Outputs

Name	Type	Description
Vector struct	FP32Vec16, BF16Vec16, etc.	Scalar-emulated vector containing computed elements in a fixed-size array

Usage Examples

// Load 16 floats from memory using scalar fallback
vec_op::FP32Vec16 vec(input_ptr);

// Scalar element-wise multiply
vec_op::FP32Vec16 result = vec * scale_vec;

// Convert FP32 to BF16 (scalar conversion) and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);

Related Pages

Environment:Vllm_project_Vllm_CPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment