Implementation:Datajuicer Data juicer VideoWholeBodyPoseEstimationMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for extracting 2D whole-body pose keypoints from video frames provided by Data-Juicer.
Description
VideoWholeBodyPoseEstimationMapper extracts frames uniformly from videos, runs a YOLOX-based person detector (ONNX model) to locate human subjects, then applies the DWPose model (ONNX) to estimate whole-body keypoints including body, hands, feet, and face for each detected person. The pose estimation results are stored in sample metadata with separate arrays for each keypoint category. The operator supports configurable frame counts, video segmentation by duration, and optional visualization output saved to a specified directory.
Usage
Use when you need to extract human pose annotations from video datasets for applications in action recognition, motion capture dataset creation, and human-centric video understanding.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_whole_body_pose_estimation_mapper.py
Signature
@OPERATORS.register_module("video_whole_body_pose_estimation_mapper")
class VideoWholeBodyPoseEstimationMapper(Mapper):
def __init__(self, onnx_det_model: str = "yolox_l.onnx",
onnx_pose_model: str = "dw-ll_ucoco_384.onnx",
frame_num: PositiveInt = 3,
duration: float = 0,
tag_field_name: str = MetaKeys.pose_estimation_tags,
frame_dir: str = DATA_JUICER_ASSETS_CACHE,
if_save_visualization: bool = False,
save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE,
*args, **kwargs):
Import
from data_juicer.ops.mapper.video_whole_body_pose_estimation_mapper import VideoWholeBodyPoseEstimationMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| onnx_det_model | str | No | Path to the YOLOX detection ONNX model. Default: "yolox_l.onnx" |
| onnx_pose_model | str | No | Path to the DWPose estimation ONNX model. Default: "dw-ll_ucoco_384.onnx" |
| frame_num | PositiveInt | No | Number of frames to extract per video or per segment. Default: 3 |
| duration | float | No | Duration of each video segment in seconds. 0 means the entire video. Default: 0 |
| tag_field_name | str | No | Field name to store the pose estimation tags. Default: "pose_estimation_tags" |
| frame_dir | str | No | Output directory to save extracted frames. Default: DATA_JUICER_ASSETS_CACHE |
| if_save_visualization | bool | No | Whether to save visualization results. Default: False |
| save_visualization_dir | str | No | Path for saving visualization results. Default: DATA_JUICER_ASSETS_CACHE |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[Fields.meta][tag_field_name]["body_keypoints"] | list | Body keypoints for each frame |
| sample[Fields.meta][tag_field_name]["foot_keypoints"] | list | Foot keypoints for each frame |
| sample[Fields.meta][tag_field_name]["faces_keypoints"] | list | Face keypoints for each frame |
| sample[Fields.meta][tag_field_name]["hands_keypoints"] | list | Hand keypoints for each frame |
| sample[Fields.meta][tag_field_name]["bbox_results_list"] | list | Bounding box detection results for each frame |
Usage Examples
process:
- video_whole_body_pose_estimation_mapper:
frame_num: 5
duration: 2.0
if_save_visualization: true
save_visualization_dir: "./pose_visualizations"