Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vllm project Vllm CPU Types Scalar

From Leeroopedia


Knowledge Sources
Domains CPU_Inference, Portability
Last Updated 2026-02-08 00:00 GMT

Overview

Provides scalar (non-SIMD) fallback implementations of vector types for platforms without hardware SIMD support, maintaining API compatibility with the vectorized backends.

Description

This header implements the same vector type interfaces (FP16Vec8, FP16Vec16, BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16) as the SIMD-optimized headers but uses plain C arrays and scalar loop-based arithmetic. Each struct stores its elements in fixed-size arrays (e.g., f16x8_t, f32x16_t) and provides load, save, and arithmetic operations via element-wise scalar loops with compile-time unrolling. It includes software float16/bfloat16 conversion routines from float_convert.hpp.

Usage

This header is selected at compile time when no supported SIMD instruction set is available on the target platform. It ensures that vLLM can compile and run on any CPU architecture, albeit with reduced performance compared to SIMD-optimized paths.

Code Reference

Source Location

Signature

namespace vec_op {

struct FP16Vec8 : public Vec<FP16Vec8> {
    constexpr static int VEC_ELEM_NUM = 8;
    f16x8_t reg;
    explicit FP16Vec8(const void* ptr);
    explicit FP16Vec8(const FP32Vec8&);
    void save(void* ptr) const;
};

struct BF16Vec16 : public Vec<BF16Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f16x16_t reg;
    explicit BF16Vec16(const void* ptr);
    explicit BF16Vec16(const FP32Vec16&);
    void save(void* ptr) const;
    void save(void* ptr, const int elem_num) const;
};

struct FP32Vec16 : public Vec<FP32Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f32x16_t reg;
    explicit FP32Vec16(const void* ptr);
    explicit FP32Vec16(float v);
    FP32Vec16 operator*(const FP32Vec16&) const;
    FP32Vec16 operator+(const FP32Vec16&) const;
    FP32Vec16 operator-(const FP32Vec16&) const;
};

} // namespace vec_op

Import

#include "cpu/cpu_types_scalar.hpp"

I/O Contract

Inputs

Name Type Required Description
ptr const void* Yes Pointer to source data for vector load operations
v float No Scalar value to broadcast across all vector elements
elem_num int No Number of elements for partial save operations

Outputs

Name Type Description
Vector struct FP32Vec16, BF16Vec16, etc. Scalar-emulated vector containing computed elements in a fixed-size array

Usage Examples

// Load 16 floats from memory using scalar fallback
vec_op::FP32Vec16 vec(input_ptr);

// Scalar element-wise multiply
vec_op::FP32Vec16 result = vec * scale_vec;

// Convert FP32 to BF16 (scalar conversion) and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment