Implementation:Datajuicer Data juicer VideoDepthEstimationMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for video depth estimation using Video-Depth-Anything provided by Data-Juicer.

Description

VideoDepthEstimationMapper is a mapper operator that performs per-frame depth estimation on videos using the Video-Depth-Anything model, producing depth maps and optionally point clouds and visualizations. It clones the Video-Depth-Anything repository at runtime, loads the model weights (supporting both relative and metric depth modes), processes video frames with configurable resolution limits and precision (fp16/fp32), and outputs depth estimations to metadata.

Usage

Use when you need depth-aware video data annotation for 3D understanding, autonomous driving datasets, and spatial reasoning tasks requiring per-frame depth information.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/video_depth_estimation_mapper.py

Signature

@OPERATORS.register_module("video_depth_estimation_mapper")
class VideoDepthEstimationMapper(Mapper):
    def __init__(self, video_depth_model_path: str = "video_depth_anything_vitb.pth", point_cloud_dir_for_metric: str = DATA_JUICER_ASSETS_CACHE, max_res: int = 1280, torch_dtype: str = "fp16", if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, grayscale: bool = False, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_depth_estimation_mapper import VideoDepthEstimationMapper

I/O Contract

Inputs

Name	Type	Required	Description
video_depth_model_path	str	No	Path to the Video-Depth-Anything model weights (default: "video_depth_anything_vitb.pth")
point_cloud_dir_for_metric	str	No	Path for storing point clouds in metric mode
max_res	int	No	Maximum resolution threshold for videos (default: 1280)
torch_dtype	str	No	Floating point type for model inference, "fp16" or "fp32" (default: "fp16")
if_save_visualization	bool	No	Whether to save visualization results (default: False)
save_visualization_dir	str	No	Path for saving visualization results
grayscale	bool	No	If True, colorful palette will not be applied (default: False)

Outputs

Name	Type	Description
samples	Dict	Transformed samples with depth estimation data and fps in metadata

Usage Examples

process:
  - video_depth_estimation_mapper:
      video_depth_model_path: "video_depth_anything_vitb.pth"
      max_res: 1280
      torch_dtype: "fp16"

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment