Implementation:Haosulab ManiSkill VisualizationUtils

Knowledge Sources	Haosulab_ManiSkill
Domains	Robotics, Simulation, Visualization
Last Updated	2026-02-15 08:00 GMT

Overview

Concrete tool for creating videos from image sequences, tiling multiple images into grids, and overlaying text on images.

Description

The misc.py module in the visualization package provides image and video manipulation utilities for recording and displaying simulation outputs.

images_to_video():

Converts a list of RGB image arrays (HxWx3) into an MP4 video using imageio/FFMPEG.
Configurable FPS and quality (variable bitrate, 0-10 scale).
Creates the output directory if it does not exist.
Shows a tqdm progress bar during encoding when verbose is True.

tile_images():

Combines multiple images into a single tiled image with configurable rows.
Supports both regular images (H, W, C) and batched images (B, H, W, C).
With nrows=1, images are sorted by height and arranged in columns.
With nrows>1, all images must have the same dimensions.
Works with both numpy arrays and torch tensors.

put_text_on_image():

Renders text lines onto an image using PIL.
Uses the Ubuntu Sans Mono font (shipped with the module).
Text is green colored, positioned with 10px left margin.

put_info_on_image():

Convenience wrapper that formats a dict of metrics as "key: value" lines and renders them on the image.
Supports extra text lines appended after the metric lines.

Usage

Used for recording evaluation episodes, creating visualization videos, and overlaying debug information on rendered images. Commonly used in evaluation scripts and the ManiSkill viewer.

Code Reference

Source Location

Repository: Haosulab_ManiSkill
File: mani_skill/utils/visualization/misc.py

Signature

def images_to_video(
    images: list[Array],
    output_dir: str,
    video_name: str,
    fps: int = 10,
    quality: Optional[float] = 5,
    verbose: bool = True,
    **kwargs,
) -> None: ...

def tile_images(images: list[Array], nrows=1) -> Array: ...
def put_text_on_image(image: np.ndarray, lines: list[str]) -> np.ndarray: ...
def put_info_on_image(image, info: dict[str, float], extras=None, overlay=True) -> np.ndarray: ...

Import

from mani_skill.utils.visualization.misc import images_to_video, tile_images, put_info_on_image

I/O Contract

Inputs

Name	Type	Required	Description
images	list[Array]	Yes	List of HxWx3 RGB images (uint8 or float)
output_dir	str	Yes	Directory to save the video
video_name	str	Yes	Name for the output video file
fps	int	No	Frames per second (default: 10)
nrows	int	No	Number of rows for tiling (default: 1)

Outputs

Name	Type	Description
(video file)	.mp4	Written to output_dir/video_name.mp4
tiled_image	Array	Single image combining all input images
annotated_image	np.ndarray	Image with text overlay

Usage Examples

Basic Usage

from mani_skill.utils.visualization.misc import images_to_video, tile_images

# Record a video from rendered frames
frames = []
for _ in range(100):
    obs, _, _, _, _ = env.step(action)
    frame = env.render()
    frames.append(frame)
images_to_video(frames, "output/", "episode", fps=30)

# Tile multiple camera views
tiled = tile_images([cam1_img, cam2_img, cam3_img], nrows=1)

Related Pages

Environment:Haosulab_ManiSkill_Python_SAPIEN_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment