Implementation:Datajuicer Data juicer VideoCameraPoseMapper
| Knowledge Sources | |
|---|---|
| Domains | Video Processing, 3D Vision, Camera Pose Estimation |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Extracts camera pose trajectories from videos by combining MoGe-2 for monocular geometry estimation with MegaSaM for Structure-from-Motion-based pose estimation.
Description
VideoCameraPoseMapper is an advanced video analysis operator that automates camera trajectory extraction from video data. It operates in a two-phase pipeline:
- Monocular Geometry Estimation (MoGe-2) -- Uses the video_camera_calibration_static_moge_mapper sub-operator to extract per-frame depth maps, camera intrinsics, and masks from the video
- Camera Pose Estimation (MegaSaM/Droid-SLAM) -- Feeds the MoGe-2 results into a MegaSaM-based pipeline (Droid-SLAM) to compute camera extrinsic parameters (rotation and translation) across frames using the SE3 Lie group representation
During initialization, the operator:
- Clones the MegaSaM repository if not present
- Patches CUDA kernel source files (replacing deprecated .type() with .scalar_type())
- Installs droid_backends, lietorch, and torch-scatter if not already available
- Configures the Droid-SLAM pipeline with parameters from the droid_args helper class
The output includes camera-to-world transformation matrices (cam_c2w), camera intrinsic matrix (K), depth maps, and frame images, stored in the sample metadata under video_camera_pose_tags. Optionally saves results to NPZ files.
Requires CUDA acceleration.
Usage
Use this operator for extracting camera trajectories from video data, supporting applications like 3D scene reconstruction, visual SLAM dataset creation, camera motion analysis, and novel view synthesis.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_camera_pose_mapper.py
- Lines: 1-338
Signature
class VideoCameraPoseMapper(Mapper):
_accelerator = "cuda"
def __init__(
self,
moge_model_path: str = "Ruicheng/moge-2-vitl",
frame_num: PositiveInt = 3,
duration: float = 0,
tag_field_name: str = MetaKeys.video_camera_pose_tags,
frame_dir: str = DATA_JUICER_ASSETS_CACHE,
if_output_moge_info: bool = False,
moge_output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
if_save_info: bool = True,
output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
max_frames: int = 1000,
*args, **kwargs,
):
Import
from data_juicer.ops.mapper.video_camera_pose_mapper import VideoCameraPoseMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| moge_model_path | str | No | Path to the MoGe-2 model. Default: "Ruicheng/moge-2-vitl" |
| frame_num | PositiveInt | No | Number of frames to extract uniformly. Default: 3 |
| duration | float | No | Duration per segment in seconds. 0 means entire video. Default: 0 |
| tag_field_name | str | No | Metadata field name for storing results. Default: "video_camera_pose_tags" |
| frame_dir | str | No | Directory to save extracted frames. Default: DATA_JUICER_ASSETS_CACHE |
| if_output_moge_info | bool | No | Whether to save MoGe-2 results to JSON. Default: False |
| if_save_info | bool | No | Whether to save results to NPZ file. Default: True |
| output_info_dir | str | No | Directory for saving NPZ results. Default: DATA_JUICER_ASSETS_CACHE |
| max_frames | int | No | Maximum number of frames to save. Default: 1000 |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[Fields.meta][tag_field_name]["frames_folder"] | str | Path to extracted frames directory |
| sample[Fields.meta][tag_field_name]["frame_names"] | list[str] | List of frame file names |
| sample[Fields.meta][tag_field_name]["images"] | np.ndarray | Frame images (N, H, W, 3) |
| sample[Fields.meta][tag_field_name]["depths"] | np.ndarray | Per-frame depth maps |
| sample[Fields.meta][tag_field_name]["intrinsic"] | np.ndarray | 3x3 camera intrinsic matrix K |
| sample[Fields.meta][tag_field_name]["cam_c2w"] | np.ndarray | Camera-to-world transformation matrices (N, 4, 4) |
Usage Examples
# Basic usage
mapper = VideoCameraPoseMapper(
moge_model_path="Ruicheng/moge-2-vitl",
frame_num=10,
if_save_info=True,
output_info_dir="/data/camera_poses/",
)
# Process a sample
sample = {
"videos": ["/path/to/video.mp4"],
Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access camera poses
cam_c2w = result[Fields.meta]["video_camera_pose_tags"]["cam_c2w"]