Implementation:Datajuicer Data juicer VideoHandReconstructionMapper
| Knowledge Sources | |
|---|---|
| Domains | Video Processing, 3D Reconstruction, Hand Pose Estimation |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Performs hand localization and 3D reconstruction from video frames using the WiLoR model with the MANO parametric hand model, producing hand meshes, visualizations, and pose parameters.
Description
VideoHandReconstructionMapper provides an alternative hand reconstruction approach to HaWoR using the WiLoR model. It processes video frames through a multi-step pipeline:
- Frame Extraction -- Uses the video_extract_frames_mapper sub-operator to uniformly sample frames from the video, with configurable frame count and segment duration
- Hand Detection -- Detects hands in each frame using a YOLO-based detector model with configurable confidence threshold (default: 0.3), identifying both bounding boxes and handedness (left/right)
- 3D Reconstruction -- For each frame with detected hands:
- Creates a ViTDetDataset from detected bounding boxes and handedness labels
- Runs the WiLoR model in batches to predict 3D hand vertices, joint positions, and camera parameters
- Converts crop-space camera parameters to full-image coordinates using cam_crop_to_full
- Projects 3D vertices to 2D keypoints using the project_full_img method
The operator provides additional output capabilities:
- Mesh Export -- Optionally saves hand meshes as OBJ files via trimesh (if_save_mesh)
- Visualization -- Optionally renders RGBA overlays of reconstructed hands on the original frames using a renderer (if_save_visualization)
During initialization, the operator clones the WiLoR repository, installs required packages (chumpy, smplx 0.1.28, yacs, timm, pyrender, pytorch_lightning, scikit-image), and imports WiLoR-specific utilities.
The output includes per-frame lists of:
- vertices (3D hand mesh vertices)
- camera_translation (full-image camera translation)
- if_right_hand (handedness flag)
- joints (3D joint positions)
- keypoints (2D projected keypoints)
Requires CUDA acceleration and the MANO hand model (MANO_RIGHT.pkl from the official MANO website).
Usage
Use this operator as an alternative to VideoHandReconstructionHaworMapper when mesh export and visual overlay capabilities are needed. It is suitable for hand-centric video data annotation, gesture analysis, and hand tracking dataset creation.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_hand_reconstruction_mapper.py
- Lines: 1-306
Signature
class VideoHandReconstructionMapper(Mapper):
_accelerator = "cuda"
def __init__(
self,
wilor_model_path: str = "wilor_final.ckpt",
wilor_model_config: str = "model_config.yaml",
detector_model_path: str = "detector.pt",
mano_right_path: str = "path_to_mano_right_pkl",
frame_num: PositiveInt = 3,
duration: float = 0,
batch_size: int = 16,
tag_field_name: str = MetaKeys.hand_reconstruction_tags,
frame_dir: str = DATA_JUICER_ASSETS_CACHE,
if_save_visualization: bool = True,
save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE,
if_save_mesh: bool = True,
save_mesh_dir: str = DATA_JUICER_ASSETS_CACHE,
*args, **kwargs,
):
Import
from data_juicer.ops.mapper.video_hand_reconstruction_mapper import VideoHandReconstructionMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| wilor_model_path | str | No | Path to wilor_final.ckpt. Default: "wilor_final.ckpt" |
| wilor_model_config | str | No | Path to model_config.yaml. Default: "model_config.yaml" |
| detector_model_path | str | No | Path to detector.pt. Default: "detector.pt" |
| mano_right_path | str | Yes | Path to MANO_RIGHT.pkl (must be downloaded from https://mano.is.tue.mpg.de/) |
| frame_num | PositiveInt | No | Number of frames to extract. Default: 3 |
| duration | float | No | Duration per segment in seconds. 0 means entire video. Default: 0 |
| batch_size | int | No | Batch size for simultaneous hand inference. Default: 16 |
| tag_field_name | str | No | Metadata field for storing results. Default: "hand_reconstruction_tags" |
| frame_dir | str | No | Directory for extracted frames. Default: DATA_JUICER_ASSETS_CACHE |
| if_save_visualization | bool | No | Whether to save overlay images. Default: True |
| save_visualization_dir | str | No | Directory for overlay images. Default: DATA_JUICER_ASSETS_CACHE |
| if_save_mesh | bool | No | Whether to save OBJ mesh files. Default: True |
| save_mesh_dir | str | No | Directory for mesh files. Default: DATA_JUICER_ASSETS_CACHE |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[Fields.meta][tag_field_name]["vertices"] | list[list[np.ndarray]] | Per-frame lists of 3D hand mesh vertices |
| sample[Fields.meta][tag_field_name]["camera_translation"] | list[list[np.ndarray]] | Per-frame camera translation vectors |
| sample[Fields.meta][tag_field_name]["if_right_hand"] | list[list[float]] | Per-frame handedness flags (1.0=right, 0.0=left) |
| sample[Fields.meta][tag_field_name]["joints"] | list[list[np.ndarray]] | Per-frame 3D joint positions |
| sample[Fields.meta][tag_field_name]["keypoints"] | list[list[tensor]] | Per-frame 2D projected keypoints |
Usage Examples
# Basic usage with visualization and mesh export
mapper = VideoHandReconstructionMapper(
wilor_model_path="/models/wilor_final.ckpt",
wilor_model_config="/models/model_config.yaml",
detector_model_path="/models/detector.pt",
mano_right_path="/models/MANO_RIGHT.pkl",
frame_num=10,
batch_size=32,
if_save_visualization=True,
save_visualization_dir="/output/vis/",
if_save_mesh=True,
save_mesh_dir="/output/meshes/",
)
# Process a sample
sample = {
"videos": ["/path/to/hand_video.mp4"],
Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access hand reconstruction data
vertices = result[Fields.meta]["hand_reconstruction_tags"]["vertices"]