Implementation:Datajuicer Data juicer VideoHandReconstructionHaworMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Video Processing, 3D Reconstruction, Hand Pose Estimation
Last Updated	2026-02-14 16:00 GMT

Overview

Performs 3D hand reconstruction from video frames using the HaWoR model combined with MoGe-2 for scene geometry, providing detailed hand pose and mesh information for both left and right hands.

Description

VideoHandReconstructionHaworMapper is an advanced video analysis operator that extracts 3D hand pose and shape parameters from video data. It operates in a three-phase pipeline:

FoV Estimation (MoGe-2) -- Uses the video_camera_calibration_static_moge_mapper sub-operator to estimate per-frame camera field of view (horizontal FoV) and compute focal length from the median FoV across all frames
Hand Pose and Translation Estimation (HaWoR) -- Performs the core hand reconstruction:
- Detects hands using a YOLO-based detector with configurable confidence threshold
- Tracks hands across frames using YOLO's built-in tracking
- Separates detections into left and right hand tracks based on handedness classification
- Interpolates bounding boxes for missing frames using interpolate_bboxes
- Runs the HaWoR model for 3D hand mesh reconstruction on each track chunk
- Handles left-hand flipping by negating rotation axes for consistency
Global Translation Recalculation (MANO Alignment) -- Refines global translation by:
- Running the MANO parametric hand model forward pass
- Computing wrist joint positions
- Adjusting translations based on wrist offsets
- Flipping x-axis for left hand consistency

The operator outputs per-frame hand reconstruction parameters for both hands:

beta (shape parameters)
hand_pose (joint rotations)
global_orient (wrist orientation)
transl (global translation)

During initialization, the operator clones the HaWoR repository, installs required packages (lap, pytorch_lightning, yacs, scikit-image, timm, omegaconf, smplx, chumpy), and downloads the detector model if not present.

Requires CUDA acceleration and the MANO hand model (MANO_RIGHT.pkl from the official MANO website).

Usage

Use this operator for automated 3D hand pose and shape extraction from video data, supporting applications in gesture recognition, hand-object interaction analysis, sign language dataset creation, and hand motion capture.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/video_hand_reconstruction_hawor_mapper.py
Lines: 1-474

Signature

class VideoHandReconstructionHaworMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        hawor_model_path: str = "hawor.ckpt",
        hawor_config_path: str = "model_config.yaml",
        hawor_detector_path: str = "detector.pt",
        moge_model_path: str = "Ruicheng/moge-2-vitl",
        mano_right_path: str = "path_to_mano_right_pkl",
        frame_num: PositiveInt = 3,
        duration: float = 0,
        thresh: float = 0.2,
        tag_field_name: str = MetaKeys.hand_reconstruction_hawor_tags,
        frame_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_output_moge_info: bool = False,
        moge_output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper import VideoHandReconstructionHaworMapper

I/O Contract

Inputs

Name	Type	Required	Description
hawor_model_path	str	No	Path to hawor.ckpt. Default: "hawor.ckpt"
hawor_config_path	str	No	Path to model_config.yaml. Default: "model_config.yaml"
hawor_detector_path	str	No	Path to detector.pt. Default: "detector.pt"
moge_model_path	str	No	Path to MoGe-2 model. Default: "Ruicheng/moge-2-vitl"
mano_right_path	str	Yes	Path to MANO_RIGHT.pkl (must be downloaded from https://mano.is.tue.mpg.de/)
frame_num	PositiveInt	No	Number of frames to extract. Default: 3
duration	float	No	Duration per segment. 0 means entire video. Default: 0
thresh	float	No	Confidence threshold for hand detection. Default: 0.2
tag_field_name	str	No	Metadata field for storing results. Default: "hand_reconstruction_hawor_tags"
frame_dir	str	No	Directory for extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_output_moge_info	bool	No	Whether to save MoGe-2 results. Default: False

Outputs

Name	Type	Description
sample[Fields.meta][tag_field_name]["fov_x"]	float	Median horizontal field of view
sample[Fields.meta][tag_field_name]["left_frame_id_list"]	list[int]	Frame indices where left hand was detected
sample[Fields.meta][tag_field_name]["left_beta_list"]	list[np.ndarray]	Left hand shape parameters per frame
sample[Fields.meta][tag_field_name]["left_hand_pose_list"]	list[np.ndarray]	Left hand joint rotations per frame
sample[Fields.meta][tag_field_name]["left_global_orient_list"]	list[np.ndarray]	Left hand global orientation per frame
sample[Fields.meta][tag_field_name]["left_transl_list"]	list[np.ndarray]	Left hand global translation per frame
sample[Fields.meta][tag_field_name]["right_*"]	list	Same fields for the right hand

Usage Examples

# Basic usage
mapper = VideoHandReconstructionHaworMapper(
    hawor_model_path="/models/hawor.ckpt",
    hawor_config_path="/models/model_config.yaml",
    hawor_detector_path="/models/detector.pt",
    mano_right_path="/models/MANO_RIGHT.pkl",
    frame_num=10,
    thresh=0.3,
)

# Process a sample
sample = {
    "videos": ["/path/to/hand_video.mp4"],
    Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access hand reconstruction data
left_poses = result[Fields.meta]["hand_reconstruction_hawor_tags"]["left_hand_pose_list"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment