Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoWholeBodyPoseEstimationMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for extracting 2D whole-body pose keypoints from video frames provided by Data-Juicer.

Description

VideoWholeBodyPoseEstimationMapper extracts frames uniformly from videos, runs a YOLOX-based person detector (ONNX model) to locate human subjects, then applies the DWPose model (ONNX) to estimate whole-body keypoints including body, hands, feet, and face for each detected person. The pose estimation results are stored in sample metadata with separate arrays for each keypoint category. The operator supports configurable frame counts, video segmentation by duration, and optional visualization output saved to a specified directory.

Usage

Use when you need to extract human pose annotations from video datasets for applications in action recognition, motion capture dataset creation, and human-centric video understanding.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_whole_body_pose_estimation_mapper")
class VideoWholeBodyPoseEstimationMapper(Mapper):
    def __init__(self, onnx_det_model: str = "yolox_l.onnx",
                 onnx_pose_model: str = "dw-ll_ucoco_384.onnx",
                 frame_num: PositiveInt = 3,
                 duration: float = 0,
                 tag_field_name: str = MetaKeys.pose_estimation_tags,
                 frame_dir: str = DATA_JUICER_ASSETS_CACHE,
                 if_save_visualization: bool = False,
                 save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE,
                 *args, **kwargs):

Import

from data_juicer.ops.mapper.video_whole_body_pose_estimation_mapper import VideoWholeBodyPoseEstimationMapper

I/O Contract

Inputs

Name Type Required Description
onnx_det_model str No Path to the YOLOX detection ONNX model. Default: "yolox_l.onnx"
onnx_pose_model str No Path to the DWPose estimation ONNX model. Default: "dw-ll_ucoco_384.onnx"
frame_num PositiveInt No Number of frames to extract per video or per segment. Default: 3
duration float No Duration of each video segment in seconds. 0 means the entire video. Default: 0
tag_field_name str No Field name to store the pose estimation tags. Default: "pose_estimation_tags"
frame_dir str No Output directory to save extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_save_visualization bool No Whether to save visualization results. Default: False
save_visualization_dir str No Path for saving visualization results. Default: DATA_JUICER_ASSETS_CACHE

Outputs

Name Type Description
sample[Fields.meta][tag_field_name]["body_keypoints"] list Body keypoints for each frame
sample[Fields.meta][tag_field_name]["foot_keypoints"] list Foot keypoints for each frame
sample[Fields.meta][tag_field_name]["faces_keypoints"] list Face keypoints for each frame
sample[Fields.meta][tag_field_name]["hands_keypoints"] list Hand keypoints for each frame
sample[Fields.meta][tag_field_name]["bbox_results_list"] list Bounding box detection results for each frame

Usage Examples

process:
  - video_whole_body_pose_estimation_mapper:
      frame_num: 5
      duration: 2.0
      if_save_visualization: true
      save_visualization_dir: "./pose_visualizations"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment