Implementation:Vllm project Vllm CPU Types VXE
| Knowledge Sources | |
|---|---|
| Domains | CPU_Inference, SIMD, s390x |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Provides IBM z/Architecture VXE (Vector Extension) vector types for SIMD-accelerated inference on IBM Z mainframe CPUs.
Description
This header implements vectorized data types (BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16) using the s390x Vector Extension facility via vecintrin.h intrinsics. It defines convenience macros (vec_neg, vec_add, vec_sub, vec_mul, vec_div, vec_sr, vec_sl) for vector arithmetic and uses __vector float and __vector signed short registers composed into wider logical vectors. The implementation supports BFloat16 and Float32 types with load/store, element-wise arithmetic, type conversions, and reduction operations specific to the z/Architecture instruction set.
Usage
This header is conditionally included when compiling vLLM on IBM Z (s390x) mainframe platforms. It enables AI inference workloads on enterprise mainframe hardware by providing the SIMD vector primitives consumed by all CPU kernel implementations.
Code Reference
Source Location
- Repository: vllm
- File: csrc/cpu/cpu_types_vxe.hpp
- Lines: 1-954
Signature
namespace vec_op {
#define vec_neg(a) (-(a))
#define vec_add(a, b) ((a) + (b))
#define vec_sub(a, b) ((a) - (b))
#define vec_mul(a, b) ((a) * (b))
#define vec_div(a, b) ((a) / (b))
struct BF16Vec8 : public Vec<BF16Vec8> {
constexpr static int VEC_ELEM_NUM = 8;
__vector signed short reg;
explicit BF16Vec8(const void* ptr);
explicit BF16Vec8(const FP32Vec8&);
void save(void* ptr) const;
};
struct BF16Vec16 : public Vec<BF16Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
ss16x8x2_t reg;
explicit BF16Vec16(const void* ptr);
explicit BF16Vec16(const FP32Vec16&);
void save(void* ptr) const;
};
struct FP32Vec16 : public Vec<FP32Vec16> {
constexpr static int VEC_ELEM_NUM = 16;
f32x4x4_t reg;
explicit FP32Vec16(const void* ptr);
explicit FP32Vec16(float v);
FP32Vec16 operator*(const FP32Vec16&) const;
FP32Vec16 operator+(const FP32Vec16&) const;
FP32Vec16 operator-(const FP32Vec16&) const;
};
} // namespace vec_op
Import
#include "cpu/cpu_types_vxe.hpp"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ptr | const void* | Yes | Pointer to source data for vector load via z/Architecture vector load intrinsics |
| v | float | No | Scalar value to broadcast via vec_splats across all vector lanes |
| elem_num | int | No | Number of elements for partial save operations |
Outputs
| Name | Type | Description |
|---|---|---|
| Vector struct | BF16Vec16, FP32Vec16, etc. | VXE register-backed vector containing SIMD-computed elements |
Usage Examples
// Load 16 floats on s390x using VXE intrinsics
vec_op::FP32Vec16 vec(input_ptr);
// Perform SIMD multiply on z/Architecture
vec_op::FP32Vec16 result = vec * scale_vec;
// Convert FP32 to BF16 and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);