Implementation:Google deepmind Mujoco Engine Util Sparse AVX
| Knowledge Sources | |
|---|---|
| Domains | Physics Simulation, SIMD Optimization, Sparse Linear Algebra |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Header-only AVX (Advanced Vector Extensions) SIMD implementations of performance-critical sparse operations for MuJoCo, providing 4-wide double-precision vectorized computation.
Description
This header provides AVX-optimized implementations of sparse linear algebra operations that are conditionally compiled when mjUSEPLATFORMSIMD is defined and the __AVX__ compiler flag is present (double precision only). The key functions are: mju_dotSparse_avx (sparse dot product using 256-bit AVX registers to process 4 doubles in parallel with manual gather from indexed elements, horizontal reduction via 128-bit extract and add), mju_dotSparseX3_avx (batched sparse dot product for 3 vectors simultaneously, reusing the gathered vec2 values across all three dot products for supernode optimization), mju_mulMatVecSparse_avx (sparse matrix-vector multiplication with supernode support, dispatching rows in blocks of 3 via dotSparseX3), mju_addToSclScl_avx (res = res*scl1 + vec*scl2 using vectorized multiply-add), and mju_compare_avx (integer vector comparison using SSE2 128-bit operations). Each function processes elements in chunks of 4 with a scalar tail loop for remaining elements.
Usage
These functions are called via compile-time dispatch from the non-AVX versions in engine_util_sparse.h and engine_util_sparse.c when the platform supports AVX instructions, transparently accelerating sparse operations throughout the solver pipeline.
Code Reference
Source Location
- Repository: Google_deepmind_Mujoco
- File: src/engine/engine_util_sparse_avx.h
- Lines: 1-292
Key Functions
// Sparse dot product with AVX (4-wide double)
static inline
mjtNum mju_dotSparse_avx(const mjtNum* vec1, const mjtNum* vec2,
int nnz1, const int* ind1);
// Batched sparse dot product for 3 vectors (supernode)
static inline
void mju_dotSparseX3_avx(mjtNum* res0, mjtNum* res1, mjtNum* res2,
const mjtNum* vec10, const mjtNum* vec11,
const mjtNum* vec12, const mjtNum* vec2,
int nnz1, const int* ind1);
// Sparse matrix-vector multiply with AVX and supernode support
static inline
void mju_mulMatVecSparse_avx(mjtNum* res, const mjtNum* mat, const mjtNum* vec,
int nr, const int* rownnz, const int* rowadr,
const int* colind, const int* rowsuper);
// Vectorized scaled addition: res = res*scl1 + vec*scl2
static inline
void mju_addToSclScl_avx(mjtNum* res, const mjtNum* vec,
mjtNum scl1, mjtNum scl2, int n);
// Integer vector comparison using SSE2
static inline
int mju_compare_avx(const int* vec1, const int* vec2, int n);
Import
#include "engine/engine_util_sparse_avx.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vec1 | mjtNum* | Yes | Sparse vector values (contiguous, indexed by ind1) |
| vec2 | mjtNum* | Yes | Dense vector (indexed by ind1 for gather) |
| nnz1 | int | Yes | Number of non-zero elements in sparse vector |
| ind1 | int* | Yes | Indices of non-zero elements |
| mat | mjtNum* | Yes | Sparse matrix values in CSR format |
| rowsuper | int* | No | Supernode sizes for batched row processing |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | mjtNum | Dot product result (for dotSparse_avx) |
| res0, res1, res2 | mjtNum* | Three dot product results (for dotSparseX3_avx) |
| res | mjtNum* | Result vector for matrix-vector multiply or scaled addition |
| return value (compare) | int | 1 if vectors equal, 0 otherwise |