Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoRemoveWatermarkMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for detecting and removing watermarks from videos provided by Data-Juicer.

Description

VideoRemoveWatermarkMapper is a mapper operator that detects and removes watermarks from video frames within specified regions of interest (ROIs), producing cleaned videos for training data. It samples frames uniformly from the video, detects watermark pixels within configurable ROIs (specified as pixel coordinates or ratios) using either "pixel_value" analysis or "pixel_diversity" analysis, generates a watermark mask from pixels detected in a minimum number of frames, and inpaints the masked regions using OpenCV.

Usage

Use when you need to clean watermarks from video datasets where source platform watermarks would introduce unwanted artifacts into generated or analyzed content.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_remove_watermark_mapper")
class VideoRemoveWatermarkMapper(Mapper):
    def __init__(self, roi_strings: List[str] = ["0,0,0.1,0.1"], roi_type: str = "ratio", roi_key: Optional[str] = None, frame_num: PositiveInt = 10, min_frame_threshold: PositiveInt = 7, detection_method: str = "pixel_value", save_dir: str = None, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_remove_watermark_mapper import VideoRemoveWatermarkMapper

I/O Contract

Inputs

Name Type Required Description
roi_strings List[str] No List of ROI regions where watermarks are located (default: ["0,0,0.1,0.1"])
roi_type str No ROI string type: "pixel" for coordinates or "ratio" for normalized (default: "ratio")
roi_key str No Field key in samples to store per-sample ROI strings (default: None)
frame_num PositiveInt No Number of frames to extract for watermark detection (default: 10)
min_frame_threshold PositiveInt No Minimum frames a pixel must be detected as watermark (default: 7)
detection_method str No Detection method: "pixel_value" or "pixel_diversity" (default: "pixel_value")
save_dir str No Directory for generated video files; if not specified, saves alongside input files

Outputs

Name Type Description
samples Dict Transformed samples with watermark-removed video file paths

Usage Examples

process:
  - video_remove_watermark_mapper:
      roi_strings:
        - "0,0,0.15,0.05"
      roi_type: "ratio"
      detection_method: "pixel_value"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment