Implementation:Vllm project Vllm CPU Types VSX
| Knowledge Sources | |
|---|---|
| Domains | CPU_Inference, SIMD, POWER |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Implements IBM POWER VSX (Vector Scalar eXtension) vector types for SIMD-accelerated inference on POWER9+ processors.
Description
This header provides vectorized data types (BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16, INT8Vec16, INT32Vec16) using Altivec/VSX intrinsics from altivec.h. Vectors are built from __vector float and __vector signed short registers, with 128-bit operations composed into wider 256-bit and 512-bit logical vectors via struct arrays. The implementation supports BFloat16 and Float32 data types with full arithmetic, load/store, type conversion, and reduction operations tailored to the POWER ISA.
Usage
This header is conditionally included when compiling vLLM on IBM POWER (ppc64le) platforms. It enables vectorized CPU kernel execution on enterprise POWER servers, providing the SIMD primitives used by attention, activation, and GEMM routines.
Code Reference
Source Location
- Repository: vllm
- File: csrc/cpu/cpu_types_vsx.hpp
- Lines: 1-788
Signature
namespace vec_op {
struct BF16Vec8 : public Vec<BF16Vec8> {
constexpr static int VEC_ELEM_NUM = 8;
__vector signed short reg;
explicit BF16Vec8(const void* ptr);
explicit BF16Vec8(const FP32Vec8&);
void save(void* ptr) const;
};
struct BF16Vec16 : public Vec<BF16Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
ss16x8x2_t reg;
explicit BF16Vec16(const void* ptr);
explicit BF16Vec16(const FP32Vec16&);
void save(void* ptr) const;
void save(void* ptr, const int elem_num) const;
};
struct FP32Vec16 : public Vec<FP32Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
f32x4x4_t reg;
explicit FP32Vec16(const void* ptr);
explicit FP32Vec16(float v);
FP32Vec16 operator*(const FP32Vec16&) const;
FP32Vec16 operator+(const FP32Vec16&) const;
FP32Vec16 operator-(const FP32Vec16&) const;
float reduce_sum() const;
};
} // namespace vec_op
Import
#include "cpu/cpu_types_vsx.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ptr | const void* | Yes | Pointer to source data for vector load via vec_xl intrinsic |
| v | float | No | Scalar value to broadcast via vec_splats into all vector lanes |
| elem_num | int | No | Number of elements for partial save operations via vec_xst_len |
Outputs
| Name | Type | Description |
|---|---|---|
| Vector struct | BF16Vec16, FP32Vec16, etc. | VSX register-backed vector containing SIMD-computed elements |
Usage Examples
// Load 16 floats using VSX intrinsics
vec_op::FP32Vec16 vec(input_ptr);
// Perform SIMD multiply on POWER
vec_op::FP32Vec16 result = vec * scale_vec;
// Convert FP32 to BF16 and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);
// Reduce to scalar sum
float total = result.reduce_sum();