Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VggtMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for extracting 3D scene information from video using VGGT provided by Data-Juicer.

Description

VggtMapper is a mapper operator that extracts rich 3D scene information from videos using the VGGT (Visual Geometry Grounded Transformer) model, including camera poses, depth maps, point maps, and 3D point tracks. It extracts frames uniformly from the video, passes them through the VGGT-1B model running on CUDA, and outputs configurable combinations of camera parameters, depth maps, point maps from projection/unprojection, and 3D point tracks, storing results in the sample's metadata under a configurable tag field.

Usage

Use when you need automated 3D scene understanding from video data, supporting downstream tasks like 3D reconstruction, camera estimation, and spatial reasoning for video datasets.

Code Reference

Source Location

Signature

@OPERATORS.register_module("vggt_mapper")
class VggtMapper(Mapper):
    def __init__(self, vggt_model_path: str = "facebook/VGGT-1B", frame_num: PositiveInt = 3, duration: float = 0, tag_field_name: str = MetaKeys.vggt_tags, frame_dir: str = DATA_JUICER_ASSETS_CACHE, if_output_camera_parameters: bool = True, if_output_depth_maps: bool = True, if_output_point_maps_from_projection: bool = True, if_output_point_maps_from_unprojection: bool = True, if_output_point_tracks: bool = True, *args, **kwargs):

Import

from data_juicer.ops.mapper.vggt_mapper import VggtMapper

I/O Contract

Inputs

Name Type Required Description
vggt_model_path str No Path to the VGGT model (default: "facebook/VGGT-1B")
frame_num PositiveInt No Number of frames to extract uniformly from the video (default: 3)
duration float No Duration of each segment in seconds; 0 for entire video (default: 0)
tag_field_name str No Field name to store the tags (default: "vggt_tags")
frame_dir str No Output directory for extracted frames
if_output_camera_parameters bool No Whether to output camera parameters (default: True)
if_output_depth_maps bool No Whether to output depth maps (default: True)
if_output_point_maps_from_projection bool No Whether to output point maps from projection (default: True)
if_output_point_maps_from_unprojection bool No Whether to output point maps from unprojection (default: True)
if_output_point_tracks bool No Whether to output 3D point tracks (default: True)

Outputs

Name Type Description
samples Dict Transformed samples with VGGT metadata including camera parameters, depth maps, point maps, and point tracks

Usage Examples

process:
  - vggt_mapper:
      vggt_model_path: "facebook/VGGT-1B"
      frame_num: 5
      if_output_camera_parameters: true
      if_output_depth_maps: true

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment