Implementation:Datajuicer Data juicer VideoRemoveWatermarkMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for detecting and removing watermarks from videos provided by Data-Juicer.
Description
VideoRemoveWatermarkMapper is a mapper operator that detects and removes watermarks from video frames within specified regions of interest (ROIs), producing cleaned videos for training data. It samples frames uniformly from the video, detects watermark pixels within configurable ROIs (specified as pixel coordinates or ratios) using either "pixel_value" analysis or "pixel_diversity" analysis, generates a watermark mask from pixels detected in a minimum number of frames, and inpaints the masked regions using OpenCV.
Usage
Use when you need to clean watermarks from video datasets where source platform watermarks would introduce unwanted artifacts into generated or analyzed content.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_remove_watermark_mapper.py
Signature
@OPERATORS.register_module("video_remove_watermark_mapper")
class VideoRemoveWatermarkMapper(Mapper):
def __init__(self, roi_strings: List[str] = ["0,0,0.1,0.1"], roi_type: str = "ratio", roi_key: Optional[str] = None, frame_num: PositiveInt = 10, min_frame_threshold: PositiveInt = 7, detection_method: str = "pixel_value", save_dir: str = None, *args, **kwargs):
Import
from data_juicer.ops.mapper.video_remove_watermark_mapper import VideoRemoveWatermarkMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| roi_strings | List[str] | No | List of ROI regions where watermarks are located (default: ["0,0,0.1,0.1"]) |
| roi_type | str | No | ROI string type: "pixel" for coordinates or "ratio" for normalized (default: "ratio") |
| roi_key | str | No | Field key in samples to store per-sample ROI strings (default: None) |
| frame_num | PositiveInt | No | Number of frames to extract for watermark detection (default: 10) |
| min_frame_threshold | PositiveInt | No | Minimum frames a pixel must be detected as watermark (default: 7) |
| detection_method | str | No | Detection method: "pixel_value" or "pixel_diversity" (default: "pixel_value") |
| save_dir | str | No | Directory for generated video files; if not specified, saves alongside input files |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with watermark-removed video file paths |
Usage Examples
process:
- video_remove_watermark_mapper:
roi_strings:
- "0,0,0.15,0.05"
roi_type: "ratio"
detection_method: "pixel_value"