Implementation:Datajuicer Data juicer VggtMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for extracting 3D scene information from video using VGGT provided by Data-Juicer.
Description
VggtMapper is a mapper operator that extracts rich 3D scene information from videos using the VGGT (Visual Geometry Grounded Transformer) model, including camera poses, depth maps, point maps, and 3D point tracks. It extracts frames uniformly from the video, passes them through the VGGT-1B model running on CUDA, and outputs configurable combinations of camera parameters, depth maps, point maps from projection/unprojection, and 3D point tracks, storing results in the sample's metadata under a configurable tag field.
Usage
Use when you need automated 3D scene understanding from video data, supporting downstream tasks like 3D reconstruction, camera estimation, and spatial reasoning for video datasets.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/vggt_mapper.py
Signature
@OPERATORS.register_module("vggt_mapper")
class VggtMapper(Mapper):
def __init__(self, vggt_model_path: str = "facebook/VGGT-1B", frame_num: PositiveInt = 3, duration: float = 0, tag_field_name: str = MetaKeys.vggt_tags, frame_dir: str = DATA_JUICER_ASSETS_CACHE, if_output_camera_parameters: bool = True, if_output_depth_maps: bool = True, if_output_point_maps_from_projection: bool = True, if_output_point_maps_from_unprojection: bool = True, if_output_point_tracks: bool = True, *args, **kwargs):
Import
from data_juicer.ops.mapper.vggt_mapper import VggtMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vggt_model_path | str | No | Path to the VGGT model (default: "facebook/VGGT-1B") |
| frame_num | PositiveInt | No | Number of frames to extract uniformly from the video (default: 3) |
| duration | float | No | Duration of each segment in seconds; 0 for entire video (default: 0) |
| tag_field_name | str | No | Field name to store the tags (default: "vggt_tags") |
| frame_dir | str | No | Output directory for extracted frames |
| if_output_camera_parameters | bool | No | Whether to output camera parameters (default: True) |
| if_output_depth_maps | bool | No | Whether to output depth maps (default: True) |
| if_output_point_maps_from_projection | bool | No | Whether to output point maps from projection (default: True) |
| if_output_point_maps_from_unprojection | bool | No | Whether to output point maps from unprojection (default: True) |
| if_output_point_tracks | bool | No | Whether to output 3D point tracks (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with VGGT metadata including camera parameters, depth maps, point maps, and point tracks |
Usage Examples
process:
- vggt_mapper:
vggt_model_path: "facebook/VGGT-1B"
frame_num: 5
if_output_camera_parameters: true
if_output_depth_maps: true