Workflow:Google deepmind Mujoco Simulation benchmarking
| Knowledge Sources | |
|---|---|
| Domains | Robotics_Simulation, Performance_Engineering, Benchmarking |
| Last Updated | 2026-02-15 04:30 GMT |
Overview
End-to-end process for measuring MuJoCo simulation performance, covering both the native C engine (single and multi-threaded) and the MJX GPU-accelerated backends (JAX and Warp).
Description
This workflow covers simulation benchmarking across all MuJoCo compute backends. For the native C engine, it measures steps per second, realtime factor, solver iterations, contact/constraint counts, and provides a detailed internal profiler breakdown (position, velocity, acceleration, constraint phases). For MJX, it measures GPU throughput across batched parallel rollouts including JIT compilation time. Benchmarking uses pseudo-random control noise (Ornstein-Uhlenbeck process) to exercise realistic contact dynamics. Multi-threaded CPU benchmarking supports independent parallel rollouts and engine-internal thread pools.
Usage
Execute this workflow when you need to evaluate simulation throughput for a specific model, compare performance across hardware configurations (CPU cores, GPU types), assess the impact of model complexity on simulation speed, profile which simulation phases are bottlenecks, or validate that engine optimizations improve performance.
Execution Steps
Step 1: Load_model_and_configure
Load the model file and parse benchmark configuration parameters: number of simulation steps, thread count, control noise parameters, and engine-internal thread pool size. Optionally load a named keyframe as the initial state to start the simulation in a specific configuration.
Key considerations:
- The "test" keyframe is loaded if present in the model
- Control noise parameters (std, rate) create realistic actuator inputs
- Thread count determines the number of independent parallel rollouts (CPU)
- Engine-internal thread pool enables parallelism within a single simulation step
- For MJX benchmarks, batch size and solver configuration are additional parameters
Step 2: Allocate_per_thread_resources
Create independent mjData instances for each simulation thread (CPU) or configure batched data structures (MJX). Each thread gets its own copy of the simulation state to avoid synchronization overhead. For engine-internal threading, create and bind a thread pool to each mjData instance.
Key considerations:
- Each thread requires its own mjData for independent rollouts
- Thread pools enable parallel constraint island solving within one step
- Memory usage scales linearly with thread count
- For MJX, batch data is created as a single vectorized structure on GPU
Step 3: Generate_control_sequence
Pre-generate a deterministic pseudo-random control signal using an Ornstein-Uhlenbeck process with Halton quasi-random sequences. This creates realistic actuator inputs that exercise contact dynamics while being reproducible across benchmark runs. Controls are clipped to actuator limits when specified.
Key considerations:
- Ornstein-Uhlenbeck process provides smooth, correlated control noise
- Halton sequences provide better coverage than pure random sampling
- Control signals converge to the keyframe midpoint at the specified rate
- Deterministic sequences ensure reproducible benchmark results
Step 4: Execute_benchmark_rollouts
Launch simulation rollouts across all threads (CPU) or as a batched computation (MJX). Each rollout advances the simulation for the specified number of steps while accumulating performance statistics: contact counts, constraint counts, and solver iterations. Wall-clock time is measured for the entire execution.
Key considerations:
- CPU threads run independently with no synchronization during rollout
- MJX rollouts are vectorized and JIT-compiled for GPU execution
- Solver iteration counts may vary per step due to contact state changes
- Island decomposition allows per-island iteration counting
Step 5: Report_performance_metrics
Compute and display comprehensive performance statistics including total and per-thread simulation time, steps per second, realtime factor, time per step, average solver iterations, contacts and constraints per step, degrees of freedom, and memory usage. For the C engine, additionally display the internal profiler breakdown showing time spent in each simulation phase (position, velocity, acceleration, constraint) and sub-phases (kinematics, inertia, collision broadphase, collision narrowphase).
Key considerations:
- Realtime factor indicates how many times faster than real-time the simulation runs
- Internal profiler requires installing the timer callback
- Profiler breakdown identifies bottleneck phases for optimization
- Multi-thread summary shows aggregate throughput across all threads
- MJX reports include JIT compilation time separately from execution time