Implementation:Rapidsai Cuml SimpleDenseMat

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Linear_Algebra
Last Updated	2026-02-08 12:00 GMT

Overview

A lightweight GPU dense matrix and vector library used internally by the cuML Quasi-Newton (QN) solver for generalized linear models, providing GEMM, element-wise operations, and norm computations.

Description

dense.hpp defines a family of GPU matrix and vector types used by the Quasi-Newton optimization solver in cuML. These types wrap raw device pointers with dimension and storage-order metadata, providing a clean API for linear algebra operations without the overhead of full matrix library abstractions.

The header defines the following types and utilities:

SimpleDenseMat<T> -- A non-owning dense matrix view supporting both column-major and row-major storage orders. Key methods include:

gemm -- Static method implementing general matrix multiplication via cuBLAS, with automatic handling of mixed storage orders by transposing as needed.
gemmb / assign_gemm -- Instance methods for GEMM with this as one of the operands.
ax -- Scalar-matrix multiply (this = a*x).
axpy -- Scaled addition (this = a*x + y).
assign_unary / assign_binary / assign_ternary -- Element-wise operations with custom lambdas.
fill -- Fill matrix with a constant value.
copy_async -- Asynchronous device-to-device copy.

SimpleVec<T> -- A vector type extending SimpleDenseMat<T> with n=1, providing assign_gemv for matrix-vector multiplication.

SimpleVecOwning<T> / SimpleMatOwning<T> -- Owning variants that manage their own device memory via rmm::device_uvector.

Free functions:

col_ref -- Create a vector view referencing a single column of a column-major matrix.
col_slice -- Create a matrix view referencing a contiguous range of columns.
dot, squaredNorm, nrm1, nrm2, nrmMax -- Vector reduction operations using raft primitives.
operator<< -- Stream output operators for debugging.

The GEMM implementation handles mixed storage orders by recursively converting row-major matrices to equivalent transposed column-major representations.

Usage

These types are used internally by the QN solver (cpp/src/glm/qn/) for gradient computations, Hessian-vector products, and line search operations in logistic regression, linear regression, and other GLM models.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/src/glm/qn/simple_mat/dense.hpp

Signature

namespace ML {

enum STORAGE_ORDER { COL_MAJOR = 0, ROW_MAJOR = 1 };

template <typename T>
struct SimpleDenseMat : SimpleMat<T> {
  int len;
  T* data;
  STORAGE_ORDER ord;

  SimpleDenseMat(T* data, int m, int n, STORAGE_ORDER order = COL_MAJOR);
  void reset(T* data_, int m_, int n_);

  static void gemm(const raft::handle_t& handle,
                    const T alpha, const SimpleDenseMat<T>& A, const bool transA,
                    const SimpleDenseMat<T>& B, const bool transB,
                    const T beta, SimpleDenseMat<T>& C, cudaStream_t stream);

  void ax(const T a, const SimpleDenseMat<T>& x, cudaStream_t stream);
  void axpy(const T a, const SimpleDenseMat<T>& x, const SimpleDenseMat<T>& y, cudaStream_t stream);
  void fill(const T val, cudaStream_t stream);
  void copy_async(const SimpleDenseMat<T>& other, cudaStream_t stream);
};

template <typename T>
struct SimpleVec : SimpleDenseMat<T> {
  SimpleVec(T* data, const int n);
  void assign_gemv(const raft::handle_t& handle, const T alpha,
                   const SimpleDenseMat<T>& A, bool transA,
                   const SimpleVec<T>& x, const T beta, cudaStream_t stream);
};

template <typename T>
T dot(const SimpleVec<T>& u, const SimpleVec<T>& v, T* tmp_dev, cudaStream_t stream);

template <typename T>
T nrm2(const SimpleVec<T>& u, T* tmp_dev, cudaStream_t stream);

} // namespace ML

Import

#include "dense.hpp"
// or from another directory:
#include <glm/qn/simple_mat/dense.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
data	T*	Yes	Device pointer to the matrix data
m	int	Yes	Number of rows
n	int	Yes	Number of columns
order	STORAGE_ORDER	No	Storage order: COL_MAJOR (default) or ROW_MAJOR
handle	raft::handle_t	Yes (for GEMM)	RAFT handle providing cuBLAS context
stream	cudaStream_t	Yes	CUDA stream for asynchronous operations

Outputs

Name	Type	Description
Result matrix/vector	SimpleDenseMat<T> or SimpleVec<T>	Modified in-place with operation results
Scalar reductions	T	Dot products, norms returned as host scalars

Usage Examples

// Create matrix views over existing device memory
SimpleDenseMat<float> A(d_A, m, k, COL_MAJOR);
SimpleDenseMat<float> B(d_B, k, n, COL_MAJOR);
SimpleDenseMat<float> C(d_C, m, n, COL_MAJOR);

// C = 1.0 * A * B + 0.0 * C
SimpleDenseMat<float>::gemm(handle, 1.0f, A, false, B, false, 0.0f, C, stream);

// Vector operations
SimpleVec<float> u(d_u, n);
SimpleVec<float> v(d_v, n);
float result = dot(u, v, d_tmp, stream);
float norm = nrm2(u, d_tmp, stream);

// Owning vector with automatic memory management
SimpleVecOwning<float> owned_vec(1024, stream);
owned_vec.fill(0.0f, stream);

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment