Implementation:Rapidsai Cuml SimpleDenseMat
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Linear_Algebra |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
A lightweight GPU dense matrix and vector library used internally by the cuML Quasi-Newton (QN) solver for generalized linear models, providing GEMM, element-wise operations, and norm computations.
Description
dense.hpp defines a family of GPU matrix and vector types used by the Quasi-Newton optimization solver in cuML. These types wrap raw device pointers with dimension and storage-order metadata, providing a clean API for linear algebra operations without the overhead of full matrix library abstractions.
The header defines the following types and utilities:
SimpleDenseMat<T> -- A non-owning dense matrix view supporting both column-major and row-major storage orders. Key methods include:
gemm-- Static method implementing general matrix multiplication via cuBLAS, with automatic handling of mixed storage orders by transposing as needed.gemmb/assign_gemm-- Instance methods for GEMM withthisas one of the operands.ax-- Scalar-matrix multiply (this = a*x).axpy-- Scaled addition (this = a*x + y).assign_unary/assign_binary/assign_ternary-- Element-wise operations with custom lambdas.fill-- Fill matrix with a constant value.copy_async-- Asynchronous device-to-device copy.
SimpleVec<T> -- A vector type extending SimpleDenseMat<T> with n=1, providing assign_gemv for matrix-vector multiplication.
SimpleVecOwning<T> / SimpleMatOwning<T> -- Owning variants that manage their own device memory via rmm::device_uvector.
Free functions:
col_ref-- Create a vector view referencing a single column of a column-major matrix.col_slice-- Create a matrix view referencing a contiguous range of columns.dot,squaredNorm,nrm1,nrm2,nrmMax-- Vector reduction operations using raft primitives.operator<<-- Stream output operators for debugging.
The GEMM implementation handles mixed storage orders by recursively converting row-major matrices to equivalent transposed column-major representations.
Usage
These types are used internally by the QN solver (cpp/src/glm/qn/) for gradient computations, Hessian-vector products, and line search operations in logistic regression, linear regression, and other GLM models.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/src/glm/qn/simple_mat/dense.hpp
Signature
namespace ML {
enum STORAGE_ORDER { COL_MAJOR = 0, ROW_MAJOR = 1 };
template <typename T>
struct SimpleDenseMat : SimpleMat<T> {
int len;
T* data;
STORAGE_ORDER ord;
SimpleDenseMat(T* data, int m, int n, STORAGE_ORDER order = COL_MAJOR);
void reset(T* data_, int m_, int n_);
static void gemm(const raft::handle_t& handle,
const T alpha, const SimpleDenseMat<T>& A, const bool transA,
const SimpleDenseMat<T>& B, const bool transB,
const T beta, SimpleDenseMat<T>& C, cudaStream_t stream);
void ax(const T a, const SimpleDenseMat<T>& x, cudaStream_t stream);
void axpy(const T a, const SimpleDenseMat<T>& x, const SimpleDenseMat<T>& y, cudaStream_t stream);
void fill(const T val, cudaStream_t stream);
void copy_async(const SimpleDenseMat<T>& other, cudaStream_t stream);
};
template <typename T>
struct SimpleVec : SimpleDenseMat<T> {
SimpleVec(T* data, const int n);
void assign_gemv(const raft::handle_t& handle, const T alpha,
const SimpleDenseMat<T>& A, bool transA,
const SimpleVec<T>& x, const T beta, cudaStream_t stream);
};
template <typename T>
T dot(const SimpleVec<T>& u, const SimpleVec<T>& v, T* tmp_dev, cudaStream_t stream);
template <typename T>
T nrm2(const SimpleVec<T>& u, T* tmp_dev, cudaStream_t stream);
} // namespace ML
Import
#include "dense.hpp"
// or from another directory:
#include <glm/qn/simple_mat/dense.hpp>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | T* | Yes | Device pointer to the matrix data |
| m | int | Yes | Number of rows |
| n | int | Yes | Number of columns |
| order | STORAGE_ORDER | No | Storage order: COL_MAJOR (default) or ROW_MAJOR |
| handle | raft::handle_t | Yes (for GEMM) | RAFT handle providing cuBLAS context |
| stream | cudaStream_t | Yes | CUDA stream for asynchronous operations |
Outputs
| Name | Type | Description |
|---|---|---|
| Result matrix/vector | SimpleDenseMat<T> or SimpleVec<T> | Modified in-place with operation results |
| Scalar reductions | T | Dot products, norms returned as host scalars |
Usage Examples
// Create matrix views over existing device memory
SimpleDenseMat<float> A(d_A, m, k, COL_MAJOR);
SimpleDenseMat<float> B(d_B, k, n, COL_MAJOR);
SimpleDenseMat<float> C(d_C, m, n, COL_MAJOR);
// C = 1.0 * A * B + 0.0 * C
SimpleDenseMat<float>::gemm(handle, 1.0f, A, false, B, false, 0.0f, C, stream);
// Vector operations
SimpleVec<float> u(d_u, n);
SimpleVec<float> v(d_v, n);
float result = dot(u, v, d_tmp, stream);
float norm = nrm2(u, d_tmp, stream);
// Owning vector with automatic memory management
SimpleVecOwning<float> owned_vec(1024, stream);
owned_vec.fill(0.0f, stream);