Principle:Haifengl Smile Matrix Arithmetic

Overview

Matrix Arithmetic encompasses the fundamental algebraic operations on matrices: multiplication (GEMM), matrix-vector products (GEMV), addition, subtraction, scaling, transposition, and rank-1 updates. These operations form the computational backbone of all higher-level linear algebra algorithms, including matrix decompositions, linear system solving, and iterative eigensolvers.

In Smile, matrix arithmetic is accelerated by delegating to native BLAS (Basic Linear Algebra Subprograms) routines via the Java Foreign Function & Memory API. BLAS organizes operations into three levels based on computational complexity and data access patterns, each offering progressively better cache utilization.

Theoretical Basis

BLAS Levels

Level	Operations	Complexity	Data Movement
Level 1	Vector-vector: dot, axpy, scale	$O (n)$	Memory-bound
Level 2	Matrix-vector: GEMV, symmetric MV	$O (n^{2})$	Memory-bound
Level 3	Matrix-matrix: GEMM, symmetric MM	$O (n^{3})$	Compute-bound

General Matrix Multiply (GEMM)

GEMM is the single most important kernel in scientific computing and machine learning. It computes:

$C = α \cdot op (A) \cdot op (B) + β \cdot C$

where $op (X)$ is either $X$ (no transpose) or $X^{T}$ (transpose), and $α, β$ are scalar multipliers.

Key properties:

$A$ is $m \times k$ , $B$ is $k \times n$ , and $C$ is $m \times n$ (after applying transpositions)
When $α = 1$ and $β = 0$ , this reduces to $C = A B$
When $β = 1$ , this accumulates into $C$ : $C + = α A B$
Computational complexity: $O (m n k)$ for a general case, which is $O (n^{3})$ for square matrices

General Matrix-Vector Multiply (GEMV)

GEMV computes the product of a matrix and a vector:

$y = α \cdot op (A) \cdot x + β \cdot y$

This is the fundamental operation in iterative methods (conjugate gradient, Lanczos, Arnoldi), where only the action of a matrix on a vector is needed.

Specialized variants:

SYMV: $y = α \cdot A \cdot x + β \cdot y$ where $A$ is symmetric -- only half the matrix is accessed
TRMV: $y = A \cdot x$ where $A$ is triangular -- fewer operations needed

Matrix Addition and Subtraction

Element-wise operations that combine matrices of identical dimensions:

$C = α A + β B$

Smile implements this via the BLAS Level 1 AXPY operation ( $y = α x + y$ ) applied to the linearized matrix arrays. For the common case $A + = B$ , this is equivalent to $axpy (1.0, B, A)$ .

Transposition

The transpose $A^{T}$ of a matrix $A \in ℝ^{m \times n}$ is the matrix $B \in ℝ^{n \times m}$ where $b_{i j} = a_{j i}$ .

Key identities:

$(A B)^{T} = B^{T} A^{T}$
$(A^{T})^{T} = A$
$(A + B)^{T} = A^{T} + B^{T}$

In Smile, transpose() creates a new matrix with copied data (not a transposed view), ensuring column-major layout for the result.

Rank-1 Update (GER)

The rank-1 update operation:

$A = A + α \cdot x \cdot y^{T}$

where $x \in ℝ^{m}$ and $y \in ℝ^{n}$ . This is the outer product scaled by $α$ , added to $A$ . It is fundamental to algorithms like the Sherman-Morrison formula and online covariance matrix updates.

Symmetric Matrix Optimization

When a matrix is flagged as symmetric (via UPLO), Smile automatically dispatches to optimized BLAS routines:

General Routine	Symmetric Variant	Speedup Factor
`cblas_dgemm`	`cblas_dsymm`	~2x (reads half the data)
`cblas_dgemv`	`cblas_dsymv`	~2x

Mathematical Formulations

Common Operations in Smile

Method	Mathematical Operation	BLAS Routine
`A.mm(B)`	$C = A \cdot B$	GEMM with $α = 1, β = 0$
`A.tm(B)`	$C = A^{T} \cdot B$	GEMM with transA
`A.mt(B)`	$C = A \cdot B^{T}$	GEMM with transB
`A.ata()`	$C = A^{T} A$	GEMM (result is symmetric)
`A.aat()`	$C = A A^{T}$	GEMM (result is symmetric)
`A.mv(x)`	$y = A \cdot x$	GEMV
`A.tv(x)`	$y = A^{T} \cdot x$	GEMV with transpose
`A.add(B)`	$A + = B$	AXPY with $α = 1$
`A.sub(B)`	$A - = B$	AXPY with $α = - 1$
`A.scale(alpha)`	$A = α A$	SCAL
`A.ger(alpha, x, y)`	$A + = α x y^{T}$	GER

Performance Characteristics

GEMM Performance Model

For an $m \times k$ by $k \times n$ multiplication:

Floating point operations: $2 m n k$
Memory accesses: $O (m n + m k + k n)$
Arithmetic intensity: $\frac{2 m n k}{(m n + m k + k n) \cdot s}$ where $s$ is the element size

For large square matrices, GEMM achieves near-peak floating point throughput because the arithmetic intensity grows with matrix size, allowing the computation to be fully cache-resident through blocking strategies.

Relationship to the Pipeline

Matrix arithmetic is the second stage of the Matrix_Decomposition_Pipeline, sitting between construction and decomposition:

Construction --> Arithmetic --> Decomposition --> Solving --> Result Extraction

Decomposition algorithms (LU, QR, SVD, EVD) internally use GEMM, GEMV, and other arithmetic primitives. ARPACK's Implicitly Restarted Arnoldi Method relies exclusively on matrix-vector products (mv/tv).

Knowledge Sources

Domains

Linear_Algebra, Numerical_Computing, High_Performance_Computing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment