Workflow:Google deepmind Mujoco Offscreen video recording

Knowledge Sources	MuJoCo MuJoCo Docs Visualization Guide
Domains	Robotics_Simulation, Rendering, Data_Collection
Last Updated	2026-02-15 04:30 GMT

Overview

End-to-end process for running a headless MuJoCo simulation with offscreen rendering to capture RGB and depth image sequences for video recording or dataset generation.

Description

This workflow covers headless (offscreen) simulation and rendering, producing raw RGB and optional depth image streams without requiring a visible window. It supports three OpenGL backends: EGL (Linux headless servers), OSMesa (software rendering), and GLFW (hidden window fallback). The pipeline loads a model, initializes offscreen rendering, runs the simulation loop while capturing frames at a specified FPS, and writes raw pixel data to a file. This is essential for generating training data for vision-based robotics, creating demonstration videos on headless servers, and batch rendering across compute clusters.

Usage

Execute this workflow when you need to record simulation footage on a headless server (no display), generate image datasets for machine learning training, create reproducible video captures of simulation behavior, or render depth maps for perception algorithm development.

Execution Steps

Step 1: Initialize_OpenGL_context

Create an offscreen OpenGL context using one of three platform backends. EGL provides hardware-accelerated headless rendering on Linux GPU servers. OSMesa enables pure software rendering without any GPU. GLFW creates an invisible window as a fallback when neither headless option is available. The backend is selected at compile time via preprocessor definitions.

Key considerations:

EGL is preferred for GPU servers (best performance, no display needed)
OSMesa works anywhere but is slower (CPU software rendering)
GLFW fallback creates a hidden window that still requires a display server
Backend selection is determined by compile-time flags (MJ_EGL, MJ_OSMESA)

Step 2: Load_model_and_initialize_simulation

Load the MJCF or MJB model file, allocate simulation data, and run an initial forward pass to populate all derived fields. Initialize the visualization pipeline (camera, options, scene, context) with the offscreen rendering context. Position the default camera to frame the model.

Key considerations:

A forward pass after data allocation ensures consistent initial rendering
The default free camera automatically frames the model based on its extents
Scene capacity must accommodate all geometry in the model
Rendering context font scale can be reduced for offscreen use

Step 3: Configure_offscreen_buffer

Switch the rendering target to the offscreen framebuffer and query its dimensions. Allocate CPU-side buffers for RGB pixel data (3 bytes per pixel) and depth values (float per pixel). The offscreen buffer size determines the output image resolution.

Key considerations:

Offscreen buffer size is determined by the rendering context configuration
RGB buffer stores 3 channels (red, green, blue) as unsigned bytes
Depth buffer stores normalized floating-point depth values (0.0 to 1.0)
Memory allocation must account for the full resolution of the offscreen buffer

Step 4: Run_simulation_and_capture_frames

Enter the main loop that advances the simulation and captures frames at the specified FPS. At each frame capture point: update the abstract scene from the simulation state, render to the offscreen buffer, optionally overlay timestamp text, read back the RGB and depth pixels, optionally composite a depth thumbnail into the RGB image, and write the raw RGB data to the output file.

Key considerations:

Frame capture is paced by simulation time, not wall-clock time
Scene update and rendering happen only at frame boundaries (not every physics step)
Pixel readback transfers data from GPU to CPU memory
Depth values can be visualized by mapping to grayscale and compositing into the image
Text overlays use the built-in font rendering system

Step 5: Finalize_and_cleanup

Close the output file, free the pixel buffers, release MuJoCo resources (data, model, scene, context), and destroy the OpenGL context. The raw RGB file can then be converted to standard video formats using external tools.

Key considerations:

The output file contains raw RGB frames (no container format or compression)
Post-processing with ffmpeg or similar tools converts to MP4, AVI, etc.
Resource cleanup order: file, buffers, MuJoCo objects, OpenGL context
Platform-specific OpenGL context destruction follows the same backend logic as creation

Execution Diagram

GitHub URL

Workflow Repository