Implementation:Datajuicer Data juicer VideoHandReconstructionMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Video Processing, 3D Reconstruction, Hand Pose Estimation
Last Updated	2026-02-14 16:00 GMT

Overview

Performs hand localization and 3D reconstruction from video frames using the WiLoR model with the MANO parametric hand model, producing hand meshes, visualizations, and pose parameters.

Description

VideoHandReconstructionMapper provides an alternative hand reconstruction approach to HaWoR using the WiLoR model. It processes video frames through a multi-step pipeline:

Frame Extraction -- Uses the video_extract_frames_mapper sub-operator to uniformly sample frames from the video, with configurable frame count and segment duration
Hand Detection -- Detects hands in each frame using a YOLO-based detector model with configurable confidence threshold (default: 0.3), identifying both bounding boxes and handedness (left/right)
3D Reconstruction -- For each frame with detected hands:
- Creates a ViTDetDataset from detected bounding boxes and handedness labels
- Runs the WiLoR model in batches to predict 3D hand vertices, joint positions, and camera parameters
- Converts crop-space camera parameters to full-image coordinates using cam_crop_to_full
- Projects 3D vertices to 2D keypoints using the project_full_img method

The operator provides additional output capabilities:

Mesh Export -- Optionally saves hand meshes as OBJ files via trimesh (if_save_mesh)
Visualization -- Optionally renders RGBA overlays of reconstructed hands on the original frames using a renderer (if_save_visualization)

During initialization, the operator clones the WiLoR repository, installs required packages (chumpy, smplx 0.1.28, yacs, timm, pyrender, pytorch_lightning, scikit-image), and imports WiLoR-specific utilities.

The output includes per-frame lists of:

vertices (3D hand mesh vertices)
camera_translation (full-image camera translation)
if_right_hand (handedness flag)
joints (3D joint positions)
keypoints (2D projected keypoints)

Requires CUDA acceleration and the MANO hand model (MANO_RIGHT.pkl from the official MANO website).

Usage

Use this operator as an alternative to VideoHandReconstructionHaworMapper when mesh export and visual overlay capabilities are needed. It is suitable for hand-centric video data annotation, gesture analysis, and hand tracking dataset creation.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/video_hand_reconstruction_mapper.py
Lines: 1-306

Signature

class VideoHandReconstructionMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        wilor_model_path: str = "wilor_final.ckpt",
        wilor_model_config: str = "model_config.yaml",
        detector_model_path: str = "detector.pt",
        mano_right_path: str = "path_to_mano_right_pkl",
        frame_num: PositiveInt = 3,
        duration: float = 0,
        batch_size: int = 16,
        tag_field_name: str = MetaKeys.hand_reconstruction_tags,
        frame_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_save_visualization: bool = True,
        save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_save_mesh: bool = True,
        save_mesh_dir: str = DATA_JUICER_ASSETS_CACHE,
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.video_hand_reconstruction_mapper import VideoHandReconstructionMapper

I/O Contract

Inputs

Name	Type	Required	Description
wilor_model_path	str	No	Path to wilor_final.ckpt. Default: "wilor_final.ckpt"
wilor_model_config	str	No	Path to model_config.yaml. Default: "model_config.yaml"
detector_model_path	str	No	Path to detector.pt. Default: "detector.pt"
mano_right_path	str	Yes	Path to MANO_RIGHT.pkl (must be downloaded from https://mano.is.tue.mpg.de/)
frame_num	PositiveInt	No	Number of frames to extract. Default: 3
duration	float	No	Duration per segment in seconds. 0 means entire video. Default: 0
batch_size	int	No	Batch size for simultaneous hand inference. Default: 16
tag_field_name	str	No	Metadata field for storing results. Default: "hand_reconstruction_tags"
frame_dir	str	No	Directory for extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_save_visualization	bool	No	Whether to save overlay images. Default: True
save_visualization_dir	str	No	Directory for overlay images. Default: DATA_JUICER_ASSETS_CACHE
if_save_mesh	bool	No	Whether to save OBJ mesh files. Default: True
save_mesh_dir	str	No	Directory for mesh files. Default: DATA_JUICER_ASSETS_CACHE

Outputs

Name	Type	Description
sample[Fields.meta][tag_field_name]["vertices"]	list[list[np.ndarray]]	Per-frame lists of 3D hand mesh vertices
sample[Fields.meta][tag_field_name]["camera_translation"]	list[list[np.ndarray]]	Per-frame camera translation vectors
sample[Fields.meta][tag_field_name]["if_right_hand"]	list[list[float]]	Per-frame handedness flags (1.0=right, 0.0=left)
sample[Fields.meta][tag_field_name]["joints"]	list[list[np.ndarray]]	Per-frame 3D joint positions
sample[Fields.meta][tag_field_name]["keypoints"]	list[list[tensor]]	Per-frame 2D projected keypoints

Usage Examples

# Basic usage with visualization and mesh export
mapper = VideoHandReconstructionMapper(
    wilor_model_path="/models/wilor_final.ckpt",
    wilor_model_config="/models/model_config.yaml",
    detector_model_path="/models/detector.pt",
    mano_right_path="/models/MANO_RIGHT.pkl",
    frame_num=10,
    batch_size=32,
    if_save_visualization=True,
    save_visualization_dir="/output/vis/",
    if_save_mesh=True,
    save_mesh_dir="/output/meshes/",
)

# Process a sample
sample = {
    "videos": ["/path/to/hand_video.mp4"],
    Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access hand reconstruction data
vertices = result[Fields.meta]["hand_reconstruction_tags"]["vertices"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment