Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Vllm project Vllm CPU Types VSX

From Leeroopedia


Knowledge Sources
Domains CPU_Inference, SIMD, POWER
Last Updated 2026-02-08 00:00 GMT

Overview

Implements IBM POWER VSX (Vector Scalar eXtension) vector types for SIMD-accelerated inference on POWER9+ processors.

Description

This header provides vectorized data types (BF16Vec8, BF16Vec16, BF16Vec32, FP32Vec4, FP32Vec8, FP32Vec16, INT8Vec16, INT32Vec16) using Altivec/VSX intrinsics from altivec.h. Vectors are built from __vector float and __vector signed short registers, with 128-bit operations composed into wider 256-bit and 512-bit logical vectors via struct arrays. The implementation supports BFloat16 and Float32 data types with full arithmetic, load/store, type conversion, and reduction operations tailored to the POWER ISA.

Usage

This header is conditionally included when compiling vLLM on IBM POWER (ppc64le) platforms. It enables vectorized CPU kernel execution on enterprise POWER servers, providing the SIMD primitives used by attention, activation, and GEMM routines.

Code Reference

Source Location

Signature

namespace vec_op {

struct BF16Vec8 : public Vec<BF16Vec8> {
    constexpr static int VEC_ELEM_NUM = 8;
    __vector signed short reg;
    explicit BF16Vec8(const void* ptr);
    explicit BF16Vec8(const FP32Vec8&);
    void save(void* ptr) const;
};

struct BF16Vec16 : public Vec<BF16Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    ss16x8x2_t reg;
    explicit BF16Vec16(const void* ptr);
    explicit BF16Vec16(const FP32Vec16&);
    void save(void* ptr) const;
    void save(void* ptr, const int elem_num) const;
};

struct FP32Vec16 : public Vec<FP32Vec16> {
    constexpr static int VEC_ELEM_NUM = 16;
    f32x4x4_t reg;
    explicit FP32Vec16(const void* ptr);
    explicit FP32Vec16(float v);
    FP32Vec16 operator*(const FP32Vec16&) const;
    FP32Vec16 operator+(const FP32Vec16&) const;
    FP32Vec16 operator-(const FP32Vec16&) const;
    float reduce_sum() const;
};

} // namespace vec_op

Import

#include "cpu/cpu_types_vsx.hpp"

I/O Contract

Inputs

Name Type Required Description
ptr const void* Yes Pointer to source data for vector load via vec_xl intrinsic
v float No Scalar value to broadcast via vec_splats into all vector lanes
elem_num int No Number of elements for partial save operations via vec_xst_len

Outputs

Name Type Description
Vector struct BF16Vec16, FP32Vec16, etc. VSX register-backed vector containing SIMD-computed elements

Usage Examples

// Load 16 floats using VSX intrinsics
vec_op::FP32Vec16 vec(input_ptr);

// Perform SIMD multiply on POWER
vec_op::FP32Vec16 result = vec * scale_vec;

// Convert FP32 to BF16 and save
vec_op::BF16Vec16 bf16_result(result);
bf16_result.save(output_ptr);

// Reduce to scalar sum
float total = result.reduce_sum();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment