Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoCameraPoseMapper

From Leeroopedia
Knowledge Sources
Domains Video Processing, 3D Vision, Camera Pose Estimation
Last Updated 2026-02-14 16:00 GMT

Overview

Extracts camera pose trajectories from videos by combining MoGe-2 for monocular geometry estimation with MegaSaM for Structure-from-Motion-based pose estimation.

Description

VideoCameraPoseMapper is an advanced video analysis operator that automates camera trajectory extraction from video data. It operates in a two-phase pipeline:

  1. Monocular Geometry Estimation (MoGe-2) -- Uses the video_camera_calibration_static_moge_mapper sub-operator to extract per-frame depth maps, camera intrinsics, and masks from the video
  2. Camera Pose Estimation (MegaSaM/Droid-SLAM) -- Feeds the MoGe-2 results into a MegaSaM-based pipeline (Droid-SLAM) to compute camera extrinsic parameters (rotation and translation) across frames using the SE3 Lie group representation

During initialization, the operator:

  • Clones the MegaSaM repository if not present
  • Patches CUDA kernel source files (replacing deprecated .type() with .scalar_type())
  • Installs droid_backends, lietorch, and torch-scatter if not already available
  • Configures the Droid-SLAM pipeline with parameters from the droid_args helper class

The output includes camera-to-world transformation matrices (cam_c2w), camera intrinsic matrix (K), depth maps, and frame images, stored in the sample metadata under video_camera_pose_tags. Optionally saves results to NPZ files.

Requires CUDA acceleration.

Usage

Use this operator for extracting camera trajectories from video data, supporting applications like 3D scene reconstruction, visual SLAM dataset creation, camera motion analysis, and novel view synthesis.

Code Reference

Source Location

Signature

class VideoCameraPoseMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        moge_model_path: str = "Ruicheng/moge-2-vitl",
        frame_num: PositiveInt = 3,
        duration: float = 0,
        tag_field_name: str = MetaKeys.video_camera_pose_tags,
        frame_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_output_moge_info: bool = False,
        moge_output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
        if_save_info: bool = True,
        output_info_dir: str = DATA_JUICER_ASSETS_CACHE,
        max_frames: int = 1000,
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.video_camera_pose_mapper import VideoCameraPoseMapper

I/O Contract

Inputs

Name Type Required Description
moge_model_path str No Path to the MoGe-2 model. Default: "Ruicheng/moge-2-vitl"
frame_num PositiveInt No Number of frames to extract uniformly. Default: 3
duration float No Duration per segment in seconds. 0 means entire video. Default: 0
tag_field_name str No Metadata field name for storing results. Default: "video_camera_pose_tags"
frame_dir str No Directory to save extracted frames. Default: DATA_JUICER_ASSETS_CACHE
if_output_moge_info bool No Whether to save MoGe-2 results to JSON. Default: False
if_save_info bool No Whether to save results to NPZ file. Default: True
output_info_dir str No Directory for saving NPZ results. Default: DATA_JUICER_ASSETS_CACHE
max_frames int No Maximum number of frames to save. Default: 1000

Outputs

Name Type Description
sample[Fields.meta][tag_field_name]["frames_folder"] str Path to extracted frames directory
sample[Fields.meta][tag_field_name]["frame_names"] list[str] List of frame file names
sample[Fields.meta][tag_field_name]["images"] np.ndarray Frame images (N, H, W, 3)
sample[Fields.meta][tag_field_name]["depths"] np.ndarray Per-frame depth maps
sample[Fields.meta][tag_field_name]["intrinsic"] np.ndarray 3x3 camera intrinsic matrix K
sample[Fields.meta][tag_field_name]["cam_c2w"] np.ndarray Camera-to-world transformation matrices (N, 4, 4)

Usage Examples

# Basic usage
mapper = VideoCameraPoseMapper(
    moge_model_path="Ruicheng/moge-2-vitl",
    frame_num=10,
    if_save_info=True,
    output_info_dir="/data/camera_poses/",
)

# Process a sample
sample = {
    "videos": ["/path/to/video.mp4"],
    Fields.meta: {},
}
result = mapper.process_single(sample, rank=0)
# Access camera poses
cam_c2w = result[Fields.meta]["video_camera_pose_tags"]["cam_c2w"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment