Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoHandReconstructionMapper

From Leeroopedia
Knowledge Sources
Domains Video Processing, 3D Reconstruction, Hand Pose Estimation
Last Updated 2026-02-14 16:00 GMT

Overview

Performs hand localization and 3D reconstruction from video frames using the WiLoR model with the MANO parametric hand model, producing hand meshes, visualizations, and pose parameters.

Description

VideoHandReconstructionMapper provides an alternative hand reconstruction approach to HaWoR using the WiLoR model. It processes video frames through a multi-step pipeline:

  1. Frame Extraction -- Uses the video_extract_frames_mapper sub-operator to uniformly sample frames from the video, with configurable frame count and segment duration
  2. Hand Detection -- Detects hands in each frame using a YOLO-based detector model with configurable confidence threshold (default: 0.3), identifying both bounding boxes and handedness (left/right)
  3. 3D Reconstruction -- For each frame with detected hands:
    • Creates a ViTDetDataset from detected bounding boxes and handedness labels
    • Runs the WiLoR model in batches to predict 3D hand vertices, joint positions, and camera parameters
    • Converts crop-space camera parameters to full-image coordinates using cam_crop_to_full
    • Projects 3D vertices to 2D keypoints using the project_full_img method

The operator provides additional output capabilities:

  • Mesh Export -- Optionally saves hand meshes as OBJ files via trimesh (if_save_mesh)
  • Visualization -- Optionally renders RGBA overlays of reconstructed hands on the original frames using a renderer (if_save_visualization)

During initialization, the operator clones the WiLoR repository, installs required packages (chumpy, smplx 0.1.28, yacs, timm, pyrender, pytorch_lightning, scikit-image), and imports WiLoR-specific utilities.

The output includes per-frame lists of:

  • vertices (3D hand mesh vertices)
  • camera_translation (full-image camera translation)
  • if_right_hand (handedness flag)
  • joints (3D joint positions)
  • keypoints (2D projected keypoints)

Requires CUDA acceleration and the MANO hand model (MANO_RIGHT.pkl from the official MANO website).

Usage

Use this operator as an alternative to VideoHandReconstructionHaworMapper when mesh export and visual overlay capabilities are needed. It is suitable for hand-centric video data annotation, gesture analysis, and hand tracking dataset creation.

Code Reference

Source Location

  • Repository: Datajuicer_Data_juicer
  • File: data_juicer/ops/mapper/video_hand_reconstruction_mapper.py
  • Lines: 1-306

Signature

class VideoHandReconstructionMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        wilor_model_path: str = "wilor_final.ckpt",
        wilor_model_config: str = "model_config.yaml",
        detector_model_path: str = "detector.pt",
        mano_right_path: str = "path_to_mano_right_pkl",
        frame_num: PositiveInt = 3,
        duration: float = 0,
        batch_size: int = 16,
        tag_field_name: str = MetaKeys.hand_reconstruction_tags,
        frame_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_save_visualization: bool = True,
        save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_save_mesh: bool = True,
        save_mesh_dir: str = DATA_JUICER_ASSETS_CACHE,
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.video_hand_reconstruction_mapper import VideoHandReconstructionMapper

I/O Contract

Inputs

Name Type Required Description
wilor_model_path str No Path to wilor_final.ckpt. Default: "wilor_final.ckpt"
wilor_model_config str No Path to model_config.yaml. Default: "model_config.yaml"
detector_model_path str No Path to detector.pt. Default: "detector.pt"
mano_right_path str Yes Path to MANO_RIGHT.pkl (must be downloaded from https://mano.is.tue.mpg.de/)
frame_num PositiveInt No Number of frames to extract. Default: 3
duration float No Duration per segment in seconds. 0 means entire video. Default: 0
batch_size int No Batch size for simultaneous hand inference. Default: 16
tag_field_name str No Metadata field for storing results. Default: "hand_reconstruction_tags"
frame_dir str No Directory for extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_save_visualization bool No Whether to save overlay images. Default: True
save_visualization_dir str No Directory for overlay images. Default: DATA_JUICER_ASSETS_CACHE
if_save_mesh bool No Whether to save OBJ mesh files. Default: True
save_mesh_dir str No Directory for mesh files. Default: DATA_JUICER_ASSETS_CACHE

Outputs

Name Type Description
sample[Fields.meta][tag_field_name]["vertices"] list[list[np.ndarray]] Per-frame lists of 3D hand mesh vertices
sample[Fields.meta][tag_field_name]["camera_translation"] list[list[np.ndarray]] Per-frame camera translation vectors
sample[Fields.meta][tag_field_name]["if_right_hand"] list[list[float]] Per-frame handedness flags (1.0=right, 0.0=left)
sample[Fields.meta][tag_field_name]["joints"] list[list[np.ndarray]] Per-frame 3D joint positions
sample[Fields.meta][tag_field_name]["keypoints"] list[list[tensor]] Per-frame 2D projected keypoints

Usage Examples

# Basic usage with visualization and mesh export
mapper = VideoHandReconstructionMapper(
    wilor_model_path="/models/wilor_final.ckpt",
    wilor_model_config="/models/model_config.yaml",
    detector_model_path="/models/detector.pt",
    mano_right_path="/models/MANO_RIGHT.pkl",
    frame_num=10,
    batch_size=32,
    if_save_visualization=True,
    save_visualization_dir="/output/vis/",
    if_save_mesh=True,
    save_mesh_dir="/output/meshes/",
)

# Process a sample
sample = {
    "videos": ["/path/to/hand_video.mp4"],
    Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access hand reconstruction data
vertices = result[Fields.meta]["hand_reconstruction_tags"]["vertices"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment