Implementation:Vllm project Vllm CPU Types VXE

Knowledge Sources	vllm
Domains	CPU_Inference, SIMD, s390x
Last Updated	2026-02-08 00:00 GMT

Overview

Provides IBM z/Architecture VXE (Vector Extension) vector types for SIMD-accelerated inference on IBM Z mainframe CPUs.

Description

This header implements vectorized data types (BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16) using the s390x Vector Extension facility via vecintrin.h intrinsics. It defines convenience macros (vec_neg, vec_add, vec_sub, vec_mul, vec_div, vec_sr, vec_sl) for vector arithmetic and uses __vector float and __vector signed short registers composed into wider logical vectors. The implementation supports BFloat16 and Float32 types with load/store, element-wise arithmetic, type conversions, and reduction operations specific to the z/Architecture instruction set.

Usage

This header is conditionally included when compiling vLLM on IBM Z (s390x) mainframe platforms. It enables AI inference workloads on enterprise mainframe hardware by providing the SIMD vector primitives consumed by all CPU kernel implementations.

Code Reference

Source Location

Repository: vllm
File: csrc/cpu/cpu_types_vxe.hpp
Lines: 1-954

Signature

namespace vec_op {

#define vec_neg(a) (-(a))
#define vec_add(a, b) ((a) + (b))
#define vec_sub(a, b) ((a) - (b))
#define vec_mul(a, b) ((a) * (b))
#define vec_div(a, b) ((a) / (b))

struct BF16Vec8 : public Vec<BF16Vec8> {
    constexpr static int VEC_ELEM_NUM = 8;
    __vector signed short reg;
    explicit BF16Vec8(const void* ptr);
    explicit BF16Vec8(const FP32Vec8&);
    void save(void* ptr) const;
};

struct BF16Vec16 : public Vec<BF16Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    ss16x8x2_t reg;
    explicit BF16Vec16(const void* ptr);
    explicit BF16Vec16(const FP32Vec16&);
    void save(void* ptr) const;
};

struct FP32Vec16 : public Vec<FP32Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f32x4x4_t reg;
    explicit FP32Vec16(const void* ptr);
    explicit FP32Vec16(float v);
    FP32Vec16 operator*(const FP32Vec16&) const;
    FP32Vec16 operator+(const FP32Vec16&) const;
    FP32Vec16 operator-(const FP32Vec16&) const;
};

} // namespace vec_op

Import

#include "cpu/cpu_types_vxe.hpp"

I/O Contract

Inputs

Name	Type	Required	Description
ptr	const void*	Yes	Pointer to source data for vector load via z/Architecture vector load intrinsics
v	float	No	Scalar value to broadcast via vec_splats across all vector lanes
elem_num	int	No	Number of elements for partial save operations

Outputs

Name	Type	Description
Vector struct	BF16Vec16, FP32Vec16, etc.	VXE register-backed vector containing SIMD-computed elements

Usage Examples

// Load 16 floats on s390x using VXE intrinsics
vec_op::FP32Vec16 vec(input_ptr);

// Perform SIMD multiply on z/Architecture
vec_op::FP32Vec16 result = vec * scale_vec;

// Convert FP32 to BF16 and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);

Related Pages

Environment:Vllm_project_Vllm_CPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment