Implementation:Datajuicer Data juicer VideoDepthEstimationMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for video depth estimation using Video-Depth-Anything provided by Data-Juicer.
Description
VideoDepthEstimationMapper is a mapper operator that performs per-frame depth estimation on videos using the Video-Depth-Anything model, producing depth maps and optionally point clouds and visualizations. It clones the Video-Depth-Anything repository at runtime, loads the model weights (supporting both relative and metric depth modes), processes video frames with configurable resolution limits and precision (fp16/fp32), and outputs depth estimations to metadata.
Usage
Use when you need depth-aware video data annotation for 3D understanding, autonomous driving datasets, and spatial reasoning tasks requiring per-frame depth information.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_depth_estimation_mapper.py
Signature
@OPERATORS.register_module("video_depth_estimation_mapper")
class VideoDepthEstimationMapper(Mapper):
def __init__(self, video_depth_model_path: str = "video_depth_anything_vitb.pth", point_cloud_dir_for_metric: str = DATA_JUICER_ASSETS_CACHE, max_res: int = 1280, torch_dtype: str = "fp16", if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, grayscale: bool = False, *args, **kwargs):
Import
from data_juicer.ops.mapper.video_depth_estimation_mapper import VideoDepthEstimationMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| video_depth_model_path | str | No | Path to the Video-Depth-Anything model weights (default: "video_depth_anything_vitb.pth") |
| point_cloud_dir_for_metric | str | No | Path for storing point clouds in metric mode |
| max_res | int | No | Maximum resolution threshold for videos (default: 1280) |
| torch_dtype | str | No | Floating point type for model inference, "fp16" or "fp32" (default: "fp16") |
| if_save_visualization | bool | No | Whether to save visualization results (default: False) |
| save_visualization_dir | str | No | Path for saving visualization results |
| grayscale | bool | No | If True, colorful palette will not be applied (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with depth estimation data and fps in metadata |
Usage Examples
process:
- video_depth_estimation_mapper:
video_depth_model_path: "video_depth_anything_vitb.pth"
max_res: 1280
torch_dtype: "fp16"