Implementation:InternLM Lmdeploy Core Array
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, Data_Structures |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Fixed-size array container for CUDA device code, providing an STL-like interface with specializations for sub-byte types (uint4_t and fp4_e2m1_t).
Description
The Array<T, N> template struct is a lightweight, fixed-size array designed for use in both host and device code within the TurboMind kernel library. It mirrors the interface of std::array with iterators, element access (operator[], front(), back()), and data pointer access. Two partial specializations handle sub-byte types: Array<uint4_t, N> and Array<fp4_e2m1_t, N>, which pack 8 elements per underlying storage unit using SubBytePtr for pointer access. Static assertions verify correct sizing for sub-byte specializations.
Usage
Use Array<T, N> as the fundamental register-resident data container in CUDA kernels. It serves as the building block for vectorized loads, stores, MMA fragment storage, and element-wise arithmetic throughout the TurboMind core kernel library.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/core/array.h
Signature
template<typename T, int N>
struct Array {
T __a[N];
TM_HOST_DEVICE constexpr reference operator[](size_type i) noexcept;
TM_HOST_DEVICE constexpr pointer data() noexcept;
TM_HOST_DEVICE constexpr iterator begin() noexcept;
TM_HOST_DEVICE constexpr iterator end() noexcept;
TM_HOST_DEVICE static constexpr std::integral_constant<int, N> size() noexcept;
};
// Sub-byte specializations
template<int N> struct Array<uint4_t, N>;
template<int N> struct Array<fp4_e2m1_t, N>;
Import
#include "src/turbomind/kernels/core/array.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| T | typename | Yes | Element type (e.g., float, half, uint4_t, fp4_e2m1_t) |
| N | int | Yes | Number of elements (must be > 0; must be multiple of 8 for sub-byte types) |
Outputs
| Name | Type | Description |
|---|---|---|
| Array<T,N> | struct | Fixed-size array with N elements of type T stored in registers |
Usage Examples
// Declare and access a register array of 4 floats
Array<float, 4> frag;
frag[0] = 1.0f;
frag[1] = 2.0f;
// Sub-byte 4-bit quantized weights (16 x 4-bit packed into 8 bytes)
Array<uint4_t, 16> quant_weights;
auto ptr = quant_weights.data(); // returns SubBytePtr<uint4_t>