Implementation:Google deepmind Mujoco Engine Util Sparse AVX

Knowledge Sources	Google_deepmind_Mujoco
Domains	Physics Simulation, SIMD Optimization, Sparse Linear Algebra
Last Updated	2026-02-15 04:00 GMT

Overview

Header-only AVX (Advanced Vector Extensions) SIMD implementations of performance-critical sparse operations for MuJoCo, providing 4-wide double-precision vectorized computation.

Description

This header provides AVX-optimized implementations of sparse linear algebra operations that are conditionally compiled when mjUSEPLATFORMSIMD is defined and the __AVX__ compiler flag is present (double precision only). The key functions are: mju_dotSparse_avx (sparse dot product using 256-bit AVX registers to process 4 doubles in parallel with manual gather from indexed elements, horizontal reduction via 128-bit extract and add), mju_dotSparseX3_avx (batched sparse dot product for 3 vectors simultaneously, reusing the gathered vec2 values across all three dot products for supernode optimization), mju_mulMatVecSparse_avx (sparse matrix-vector multiplication with supernode support, dispatching rows in blocks of 3 via dotSparseX3), mju_addToSclScl_avx (res = res*scl1 + vec*scl2 using vectorized multiply-add), and mju_compare_avx (integer vector comparison using SSE2 128-bit operations). Each function processes elements in chunks of 4 with a scalar tail loop for remaining elements.

Usage

These functions are called via compile-time dispatch from the non-AVX versions in engine_util_sparse.h and engine_util_sparse.c when the platform supports AVX instructions, transparently accelerating sparse operations throughout the solver pipeline.

Code Reference

Source Location

Repository: Google_deepmind_Mujoco
File: src/engine/engine_util_sparse_avx.h
Lines: 1-292

Key Functions

// Sparse dot product with AVX (4-wide double)
static inline
mjtNum mju_dotSparse_avx(const mjtNum* vec1, const mjtNum* vec2,
                         int nnz1, const int* ind1);

// Batched sparse dot product for 3 vectors (supernode)
static inline
void mju_dotSparseX3_avx(mjtNum* res0, mjtNum* res1, mjtNum* res2,
                         const mjtNum* vec10, const mjtNum* vec11,
                         const mjtNum* vec12, const mjtNum* vec2,
                         int nnz1, const int* ind1);

// Sparse matrix-vector multiply with AVX and supernode support
static inline
void mju_mulMatVecSparse_avx(mjtNum* res, const mjtNum* mat, const mjtNum* vec,
                             int nr, const int* rownnz, const int* rowadr,
                             const int* colind, const int* rowsuper);

// Vectorized scaled addition: res = res*scl1 + vec*scl2
static inline
void mju_addToSclScl_avx(mjtNum* res, const mjtNum* vec,
                         mjtNum scl1, mjtNum scl2, int n);

// Integer vector comparison using SSE2
static inline
int mju_compare_avx(const int* vec1, const int* vec2, int n);

Import

#include "engine/engine_util_sparse_avx.h"

I/O Contract

Inputs

Name	Type	Required	Description
vec1	mjtNum*	Yes	Sparse vector values (contiguous, indexed by ind1)
vec2	mjtNum*	Yes	Dense vector (indexed by ind1 for gather)
nnz1	int	Yes	Number of non-zero elements in sparse vector
ind1	int*	Yes	Indices of non-zero elements
mat	mjtNum*	Yes	Sparse matrix values in CSR format
rowsuper	int*	No	Supernode sizes for batched row processing

Outputs

Name	Type	Description
return value	mjtNum	Dot product result (for dotSparse_avx)
res0, res1, res2	mjtNum*	Three dot product results (for dotSparseX3_avx)
res	mjtNum*	Result vector for matrix-vector multiply or scaled addition
return value (compare)	int	1 if vectors equal, 0 otherwise

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment