Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoDepthEstimationMapper

From Leeroopedia
Revision as of 12:23, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datajuicer_Data_juicer_VideoDepthEstimationMapper.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for video depth estimation using Video-Depth-Anything provided by Data-Juicer.

Description

VideoDepthEstimationMapper is a mapper operator that performs per-frame depth estimation on videos using the Video-Depth-Anything model, producing depth maps and optionally point clouds and visualizations. It clones the Video-Depth-Anything repository at runtime, loads the model weights (supporting both relative and metric depth modes), processes video frames with configurable resolution limits and precision (fp16/fp32), and outputs depth estimations to metadata.

Usage

Use when you need depth-aware video data annotation for 3D understanding, autonomous driving datasets, and spatial reasoning tasks requiring per-frame depth information.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_depth_estimation_mapper")
class VideoDepthEstimationMapper(Mapper):
    def __init__(self, video_depth_model_path: str = "video_depth_anything_vitb.pth", point_cloud_dir_for_metric: str = DATA_JUICER_ASSETS_CACHE, max_res: int = 1280, torch_dtype: str = "fp16", if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, grayscale: bool = False, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_depth_estimation_mapper import VideoDepthEstimationMapper

I/O Contract

Inputs

Name Type Required Description
video_depth_model_path str No Path to the Video-Depth-Anything model weights (default: "video_depth_anything_vitb.pth")
point_cloud_dir_for_metric str No Path for storing point clouds in metric mode
max_res int No Maximum resolution threshold for videos (default: 1280)
torch_dtype str No Floating point type for model inference, "fp16" or "fp32" (default: "fp16")
if_save_visualization bool No Whether to save visualization results (default: False)
save_visualization_dir str No Path for saving visualization results
grayscale bool No If True, colorful palette will not be applied (default: False)

Outputs

Name Type Description
samples Dict Transformed samples with depth estimation data and fps in metadata

Usage Examples

process:
  - video_depth_estimation_mapper:
      video_depth_model_path: "video_depth_anything_vitb.pth"
      max_res: 1280
      torch_dtype: "fp16"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment