Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Google deepmind Mujoco Simulation benchmarking

From Leeroopedia
Knowledge Sources
Domains Robotics_Simulation, Performance_Engineering, Benchmarking
Last Updated 2026-02-15 04:30 GMT

Overview

End-to-end process for measuring MuJoCo simulation performance, covering both the native C engine (single and multi-threaded) and the MJX GPU-accelerated backends (JAX and Warp).

Description

This workflow covers simulation benchmarking across all MuJoCo compute backends. For the native C engine, it measures steps per second, realtime factor, solver iterations, contact/constraint counts, and provides a detailed internal profiler breakdown (position, velocity, acceleration, constraint phases). For MJX, it measures GPU throughput across batched parallel rollouts including JIT compilation time. Benchmarking uses pseudo-random control noise (Ornstein-Uhlenbeck process) to exercise realistic contact dynamics. Multi-threaded CPU benchmarking supports independent parallel rollouts and engine-internal thread pools.

Usage

Execute this workflow when you need to evaluate simulation throughput for a specific model, compare performance across hardware configurations (CPU cores, GPU types), assess the impact of model complexity on simulation speed, profile which simulation phases are bottlenecks, or validate that engine optimizations improve performance.

Execution Steps

Step 1: Load_model_and_configure

Load the model file and parse benchmark configuration parameters: number of simulation steps, thread count, control noise parameters, and engine-internal thread pool size. Optionally load a named keyframe as the initial state to start the simulation in a specific configuration.

Key considerations:

  • The "test" keyframe is loaded if present in the model
  • Control noise parameters (std, rate) create realistic actuator inputs
  • Thread count determines the number of independent parallel rollouts (CPU)
  • Engine-internal thread pool enables parallelism within a single simulation step
  • For MJX benchmarks, batch size and solver configuration are additional parameters

Step 2: Allocate_per_thread_resources

Create independent mjData instances for each simulation thread (CPU) or configure batched data structures (MJX). Each thread gets its own copy of the simulation state to avoid synchronization overhead. For engine-internal threading, create and bind a thread pool to each mjData instance.

Key considerations:

  • Each thread requires its own mjData for independent rollouts
  • Thread pools enable parallel constraint island solving within one step
  • Memory usage scales linearly with thread count
  • For MJX, batch data is created as a single vectorized structure on GPU

Step 3: Generate_control_sequence

Pre-generate a deterministic pseudo-random control signal using an Ornstein-Uhlenbeck process with Halton quasi-random sequences. This creates realistic actuator inputs that exercise contact dynamics while being reproducible across benchmark runs. Controls are clipped to actuator limits when specified.

Key considerations:

  • Ornstein-Uhlenbeck process provides smooth, correlated control noise
  • Halton sequences provide better coverage than pure random sampling
  • Control signals converge to the keyframe midpoint at the specified rate
  • Deterministic sequences ensure reproducible benchmark results

Step 4: Execute_benchmark_rollouts

Launch simulation rollouts across all threads (CPU) or as a batched computation (MJX). Each rollout advances the simulation for the specified number of steps while accumulating performance statistics: contact counts, constraint counts, and solver iterations. Wall-clock time is measured for the entire execution.

Key considerations:

  • CPU threads run independently with no synchronization during rollout
  • MJX rollouts are vectorized and JIT-compiled for GPU execution
  • Solver iteration counts may vary per step due to contact state changes
  • Island decomposition allows per-island iteration counting

Step 5: Report_performance_metrics

Compute and display comprehensive performance statistics including total and per-thread simulation time, steps per second, realtime factor, time per step, average solver iterations, contacts and constraints per step, degrees of freedom, and memory usage. For the C engine, additionally display the internal profiler breakdown showing time spent in each simulation phase (position, velocity, acceleration, constraint) and sub-phases (kinematics, inertia, collision broadphase, collision narrowphase).

Key considerations:

  • Realtime factor indicates how many times faster than real-time the simulation runs
  • Internal profiler requires installing the timer callback
  • Profiler breakdown identifies bottleneck phases for optimization
  • Multi-thread summary shows aggregate throughput across all threads
  • MJX reports include JIT compilation time separately from execution time

Execution Diagram

GitHub URL

Workflow Repository