Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoHandReconstructionHaworMapper

From Leeroopedia
Knowledge Sources
Domains Video Processing, 3D Reconstruction, Hand Pose Estimation
Last Updated 2026-02-14 16:00 GMT

Overview

Performs 3D hand reconstruction from video frames using the HaWoR model combined with MoGe-2 for scene geometry, providing detailed hand pose and mesh information for both left and right hands.

Description

VideoHandReconstructionHaworMapper is an advanced video analysis operator that extracts 3D hand pose and shape parameters from video data. It operates in a three-phase pipeline:

  1. FoV Estimation (MoGe-2) -- Uses the video_camera_calibration_static_moge_mapper sub-operator to estimate per-frame camera field of view (horizontal FoV) and compute focal length from the median FoV across all frames
  2. Hand Pose and Translation Estimation (HaWoR) -- Performs the core hand reconstruction:
    • Detects hands using a YOLO-based detector with configurable confidence threshold
    • Tracks hands across frames using YOLO's built-in tracking
    • Separates detections into left and right hand tracks based on handedness classification
    • Interpolates bounding boxes for missing frames using interpolate_bboxes
    • Runs the HaWoR model for 3D hand mesh reconstruction on each track chunk
    • Handles left-hand flipping by negating rotation axes for consistency
  3. Global Translation Recalculation (MANO Alignment) -- Refines global translation by:
    • Running the MANO parametric hand model forward pass
    • Computing wrist joint positions
    • Adjusting translations based on wrist offsets
    • Flipping x-axis for left hand consistency

The operator outputs per-frame hand reconstruction parameters for both hands:

  • beta (shape parameters)
  • hand_pose (joint rotations)
  • global_orient (wrist orientation)
  • transl (global translation)

During initialization, the operator clones the HaWoR repository, installs required packages (lap, pytorch_lightning, yacs, scikit-image, timm, omegaconf, smplx, chumpy), and downloads the detector model if not present.

Requires CUDA acceleration and the MANO hand model (MANO_RIGHT.pkl from the official MANO website).

Usage

Use this operator for automated 3D hand pose and shape extraction from video data, supporting applications in gesture recognition, hand-object interaction analysis, sign language dataset creation, and hand motion capture.

Code Reference

Source Location

  • Repository: Datajuicer_Data_juicer
  • File: data_juicer/ops/mapper/video_hand_reconstruction_hawor_mapper.py
  • Lines: 1-474

Signature

class VideoHandReconstructionHaworMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        hawor_model_path: str = "hawor.ckpt",
        hawor_config_path: str = "model_config.yaml",
        hawor_detector_path: str = "detector.pt",
        moge_model_path: str = "Ruicheng/moge-2-vitl",
        mano_right_path: str = "path_to_mano_right_pkl",
        frame_num: PositiveInt = 3,
        duration: float = 0,
        thresh: float = 0.2,
        tag_field_name: str = MetaKeys.hand_reconstruction_hawor_tags,
        frame_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_output_moge_info: bool = False,
        moge_output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.video_hand_reconstruction_hawor_mapper import VideoHandReconstructionHaworMapper

I/O Contract

Inputs

Name Type Required Description
hawor_model_path str No Path to hawor.ckpt. Default: "hawor.ckpt"
hawor_config_path str No Path to model_config.yaml. Default: "model_config.yaml"
hawor_detector_path str No Path to detector.pt. Default: "detector.pt"
moge_model_path str No Path to MoGe-2 model. Default: "Ruicheng/moge-2-vitl"
mano_right_path str Yes Path to MANO_RIGHT.pkl (must be downloaded from https://mano.is.tue.mpg.de/)
frame_num PositiveInt No Number of frames to extract. Default: 3
duration float No Duration per segment. 0 means entire video. Default: 0
thresh float No Confidence threshold for hand detection. Default: 0.2
tag_field_name str No Metadata field for storing results. Default: "hand_reconstruction_hawor_tags"
frame_dir str No Directory for extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_output_moge_info bool No Whether to save MoGe-2 results. Default: False

Outputs

Name Type Description
sample[Fields.meta][tag_field_name]["fov_x"] float Median horizontal field of view
sample[Fields.meta][tag_field_name]["left_frame_id_list"] list[int] Frame indices where left hand was detected
sample[Fields.meta][tag_field_name]["left_beta_list"] list[np.ndarray] Left hand shape parameters per frame
sample[Fields.meta][tag_field_name]["left_hand_pose_list"] list[np.ndarray] Left hand joint rotations per frame
sample[Fields.meta][tag_field_name]["left_global_orient_list"] list[np.ndarray] Left hand global orientation per frame
sample[Fields.meta][tag_field_name]["left_transl_list"] list[np.ndarray] Left hand global translation per frame
sample[Fields.meta][tag_field_name]["right_*"] list Same fields for the right hand

Usage Examples

# Basic usage
mapper = VideoHandReconstructionHaworMapper(
    hawor_model_path="/models/hawor.ckpt",
    hawor_config_path="/models/model_config.yaml",
    hawor_detector_path="/models/detector.pt",
    mano_right_path="/models/MANO_RIGHT.pkl",
    frame_num=10,
    thresh=0.3,
)

# Process a sample
sample = {
    "videos": ["/path/to/hand_video.mp4"],
    Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access hand reconstruction data
left_poses = result[Fields.meta]["hand_reconstruction_hawor_tags"]["left_hand_pose_list"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment