Workflow:Google deepmind Mujoco Simulation benchmarking

Knowledge Sources	MuJoCo MuJoCo Docs MJX Documentation
Domains	Robotics_Simulation, Performance_Engineering, Benchmarking
Last Updated	2026-02-15 04:30 GMT

Overview

End-to-end process for measuring MuJoCo simulation performance, covering both the native C engine (single and multi-threaded) and the MJX GPU-accelerated backends (JAX and Warp).

Description

This workflow covers simulation benchmarking across all MuJoCo compute backends. For the native C engine, it measures steps per second, realtime factor, solver iterations, contact/constraint counts, and provides a detailed internal profiler breakdown (position, velocity, acceleration, constraint phases). For MJX, it measures GPU throughput across batched parallel rollouts including JIT compilation time. Benchmarking uses pseudo-random control noise (Ornstein-Uhlenbeck process) to exercise realistic contact dynamics. Multi-threaded CPU benchmarking supports independent parallel rollouts and engine-internal thread pools.

Usage

Execute this workflow when you need to evaluate simulation throughput for a specific model, compare performance across hardware configurations (CPU cores, GPU types), assess the impact of model complexity on simulation speed, profile which simulation phases are bottlenecks, or validate that engine optimizations improve performance.

Execution Steps

Step 1: Load_model_and_configure

Load the model file and parse benchmark configuration parameters: number of simulation steps, thread count, control noise parameters, and engine-internal thread pool size. Optionally load a named keyframe as the initial state to start the simulation in a specific configuration.

Key considerations:

The "test" keyframe is loaded if present in the model
Control noise parameters (std, rate) create realistic actuator inputs
Thread count determines the number of independent parallel rollouts (CPU)
Engine-internal thread pool enables parallelism within a single simulation step
For MJX benchmarks, batch size and solver configuration are additional parameters

Step 2: Allocate_per_thread_resources

Create independent mjData instances for each simulation thread (CPU) or configure batched data structures (MJX). Each thread gets its own copy of the simulation state to avoid synchronization overhead. For engine-internal threading, create and bind a thread pool to each mjData instance.

Key considerations:

Each thread requires its own mjData for independent rollouts
Thread pools enable parallel constraint island solving within one step
Memory usage scales linearly with thread count
For MJX, batch data is created as a single vectorized structure on GPU

Step 3: Generate_control_sequence

Pre-generate a deterministic pseudo-random control signal using an Ornstein-Uhlenbeck process with Halton quasi-random sequences. This creates realistic actuator inputs that exercise contact dynamics while being reproducible across benchmark runs. Controls are clipped to actuator limits when specified.

Key considerations:

Ornstein-Uhlenbeck process provides smooth, correlated control noise
Halton sequences provide better coverage than pure random sampling
Control signals converge to the keyframe midpoint at the specified rate
Deterministic sequences ensure reproducible benchmark results

Step 4: Execute_benchmark_rollouts

Launch simulation rollouts across all threads (CPU) or as a batched computation (MJX). Each rollout advances the simulation for the specified number of steps while accumulating performance statistics: contact counts, constraint counts, and solver iterations. Wall-clock time is measured for the entire execution.

Key considerations:

CPU threads run independently with no synchronization during rollout
MJX rollouts are vectorized and JIT-compiled for GPU execution
Solver iteration counts may vary per step due to contact state changes
Island decomposition allows per-island iteration counting

Step 5: Report_performance_metrics

Compute and display comprehensive performance statistics including total and per-thread simulation time, steps per second, realtime factor, time per step, average solver iterations, contacts and constraints per step, degrees of freedom, and memory usage. For the C engine, additionally display the internal profiler breakdown showing time spent in each simulation phase (position, velocity, acceleration, constraint) and sub-phases (kinematics, inertia, collision broadphase, collision narrowphase).

Key considerations:

Realtime factor indicates how many times faster than real-time the simulation runs
Internal profiler requires installing the timer callback
Profiler breakdown identifies bottleneck phases for optimization
Multi-thread summary shows aggregate throughput across all threads
MJX reports include JIT compilation time separately from execution time

Execution Diagram

GitHub URL

Workflow Repository