Principle:ARISE Initiative Robosuite Omniverse Rendering
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Computer_Graphics |
| Last Updated | 2026-02-15 07:00 GMT |
Overview
A pipeline for re-rendering previously recorded simulation trajectories at photorealistic quality using a GPU-accelerated ray tracing engine, producing high-fidelity RGB images, surface normals, and semantic segmentation maps.
Description
While the built-in simulation renderer is adequate for real-time interaction, training vision-based robotic policies often benefits from photorealistic imagery that more closely resembles real-world camera observations. The Omniverse rendering pipeline bridges this gap by taking recorded demonstration trajectories (stored in HDF5 datasets) and replaying them through a photorealistic rendering engine that supports ray-traced lighting, path tracing, physically-based materials, and advanced camera effects.
The pipeline operates in two phases. In the export phase, the simulation environment is reconstructed from the dataset, and each trajectory state is replayed through the USD scene exporter to produce a time-varying USD file containing the full scene geometry, materials, lights, and cameras. In the rendering phase, this USD file is loaded into the rendering engine, which traverses each frame and produces high-resolution output images through its render products system.
The pipeline supports multiple rendering modes simultaneously. RGB rendering produces photorealistic color images using either ray-traced lighting (fast, single-bounce) or full path tracing (slower, multi-bounce global illumination). Normal map rendering produces surface normal vectors at each pixel, useful for geometry-aware perception algorithms. Semantic segmentation rendering produces per-pixel class labels based on object categories defined in the simulation, useful for training perception models that need to distinguish between object types.
The system operates in two runtime modes. Offline mode first exports the complete trajectory to a USD file, then loads that file into the renderer for frame-by-frame rendering. This mode decouples simulation from rendering, allowing them to be optimized independently. Online mode streams scene updates directly to the renderer at each timestep, enabling live visualization but requiring the simulation and renderer to run concurrently.
Camera configuration is inherited from the simulation environment. The USD export preserves the positions and orientations of all named cameras, which become render viewpoints in the rendering engine. Additional post-hoc lights (dome lights for ambient illumination, sphere lights for point sources, rectangle and cylinder lights for area sources) can be added to enhance the lighting setup beyond what is defined in the physics simulation.
Output images can be aggregated into video files for visual inspection, with per-camera and per-modality organization. The pipeline handles dataset-level iteration, processing multiple episodes with per-episode output directories.
Usage
Use Omniverse rendering when creating photorealistic training datasets for vision-based policies, when generating publication-quality videos of simulation trajectories, or when evaluating how well a trained policy's behavior would look under realistic visual conditions. The pipeline requires recorded demonstration data in HDF5 format and access to a GPU with ray tracing capabilities. Typical use involves specifying the dataset path, desired episodes, camera names, resolution, and rendering mode, then running the pipeline to produce output images or videos.
Theoretical Basis
Two-phase rendering pipeline:
Phase 1: Scene Export (Simulation -> USD)
for each episode:
env = reconstruct_environment(dataset, episode)
exporter = USDExporter(env.model)
for each state in trajectory:
set_simulation_state(state)
exporter.update_scene(data) # keyframe transforms
exporter.save_scene() # write .usd file
Phase 2: Photorealistic Rendering (USD -> Images)
stage = load_usd_stage(usd_file)
render_products = create_render_products(cameras, resolution)
writer = attach_annotators(rgb, normals, segmentation)
for each frame in timeline:
advance_frame()
orchestrator.step() # trigger rendering
writer.write(annotator_data) # save images
Render product architecture:
The rendering engine uses a data-driven approach where annotators extract specific information from the rendered scene:
Render Product (camera + resolution)
|-- RGB Annotator -> RGBA image array
|-- Normals Annotator -> surface normal vectors (xyz per pixel)
|-- Segmentation Annotator -> per-pixel semantic class labels
Normal map encoding:
Raw surface normals are in the range [-1, 1] and are encoded to [0, 255] for image storage:
normals_normalized = normals / max(||normals||, epsilon)
image_value = (normals_normalized + 1) / 2 * 255
Semantic annotation:
Scene objects are annotated with semantic labels through USD's Semantics API:
for each geom in scene:
if geom has class label:
sem_api = SemanticsAPI.Apply(prim, class_name)
sem_api.SetSemanticType("class")
sem_api.SetSemanticData(class_label)
This allows the segmentation annotator to produce per-pixel labels that correspond to the object categories defined in the simulation environment.
State reconstruction:
Trajectory replay reconstructs the simulation state from stored data:
initial_state = {
"model": xml_string, # MuJoCo model definition
"states": qpos_qvel_array, # flattened simulator state
"ep_meta": metadata # episode-specific parameters
}
reset_to(env, initial_state):
env.reset_from_xml_string(model_xml)
env.sim.set_state_from_flattened(states)
env.sim.forward()