Implementation:Datajuicer Data juicer VggtMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for extracting 3D scene information from video using VGGT provided by Data-Juicer.

Description

VggtMapper is a mapper operator that extracts rich 3D scene information from videos using the VGGT (Visual Geometry Grounded Transformer) model, including camera poses, depth maps, point maps, and 3D point tracks. It extracts frames uniformly from the video, passes them through the VGGT-1B model running on CUDA, and outputs configurable combinations of camera parameters, depth maps, point maps from projection/unprojection, and 3D point tracks, storing results in the sample's metadata under a configurable tag field.

Usage

Use when you need automated 3D scene understanding from video data, supporting downstream tasks like 3D reconstruction, camera estimation, and spatial reasoning for video datasets.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/vggt_mapper.py

Signature

@OPERATORS.register_module("vggt_mapper")
class VggtMapper(Mapper):
    def __init__(self, vggt_model_path: str = "facebook/VGGT-1B", frame_num: PositiveInt = 3, duration: float = 0, tag_field_name: str = MetaKeys.vggt_tags, frame_dir: str = DATA_JUICER_ASSETS_CACHE, if_output_camera_parameters: bool = True, if_output_depth_maps: bool = True, if_output_point_maps_from_projection: bool = True, if_output_point_maps_from_unprojection: bool = True, if_output_point_tracks: bool = True, *args, **kwargs):

Import

from data_juicer.ops.mapper.vggt_mapper import VggtMapper

I/O Contract

Inputs

Name	Type	Required	Description
vggt_model_path	str	No	Path to the VGGT model (default: "facebook/VGGT-1B")
frame_num	PositiveInt	No	Number of frames to extract uniformly from the video (default: 3)
duration	float	No	Duration of each segment in seconds; 0 for entire video (default: 0)
tag_field_name	str	No	Field name to store the tags (default: "vggt_tags")
frame_dir	str	No	Output directory for extracted frames
if_output_camera_parameters	bool	No	Whether to output camera parameters (default: True)
if_output_depth_maps	bool	No	Whether to output depth maps (default: True)
if_output_point_maps_from_projection	bool	No	Whether to output point maps from projection (default: True)
if_output_point_maps_from_unprojection	bool	No	Whether to output point maps from unprojection (default: True)
if_output_point_tracks	bool	No	Whether to output 3D point tracks (default: True)

Outputs

Name	Type	Description
samples	Dict	Transformed samples with VGGT metadata including camera parameters, depth maps, point maps, and point tracks

Usage Examples

process:
  - vggt_mapper:
      vggt_model_path: "facebook/VGGT-1B"
      frame_num: 5
      if_output_camera_parameters: true
      if_output_depth_maps: true

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment