Implementation:Datajuicer Data juicer Difference Area Generator Mapper
| Knowledge Sources | |
|---|---|
| Domains | Image Comparison, Object Detection, Change Detection |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Identifies and localizes regions of difference between two similar images by comparing their captions and bounding box contents, producing annotated bounding boxes highlighting the differing areas.
Description
Difference_Area_Generator_Mapper is the first stage of the ImgDiff pipeline. It processes image pairs to identify and filter regions with significant visual differences through an 8-step process:
- Similarity Filtering -- Filters out image pairs with large differences using an image pair similarity filter (CLIP-based)
- Caption Comparison -- Compares the two captions using difflib and NLTK lemmatization to identify differing nouns (via the compare_text_index helper)
- Image Segmentation -- Segments both images using FastSAM to identify potential object regions
- Cropping -- Crops sub-images from both images based on the segmentation bounding boxes
- Object Validation -- Uses BLIP image-text matching to determine if cropped sub-images contain the identified "valid objects"
- Difference Detection -- Applies a second round of similarity filtering on cropped region pairs to detect actual visual differences
- NMS Filtering -- Removes overlapping bounding boxes using IoU-based non-maximum suppression (via the iou_filter helper)
- Cache Cleanup -- Removes all temporary cropped images from the cache directory
The operator uses three fused sub-operators:
- image_pair_similarity_filter (CLIP-based)
- image_segment_mapper (FastSAM-based)
- image_text_matching_filter (BLIP-based)
Helper functions include is_noun (POS tag check), compare_text_index (caption diff with lemmatization), and iou_filter (NMS-style bounding box deduplication).
Requires CUDA acceleration and caches intermediate results in DATA_JUICER_ASSETS_CACHE.
Usage
Use this operator as the first stage of the ImgDiff pipeline to identify bounding box regions that differ between two similar images. It is typically followed by Difference_Caption_Generator_Mapper to generate textual descriptions of the detected differences.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/imgdiff_difference_area_generator_mapper.py
- Lines: 1-436
Signature
class Difference_Area_Generator_Mapper(Mapper):
_accelerator = "cuda"
def __init__(
self,
image_pair_similarity_filter_args: Optional[Dict] = {},
image_segment_mapper_args: Optional[Dict] = {},
image_text_matching_filter_args: Optional[Dict] = {},
*args, **kwargs,
):
Import
from data_juicer.ops.mapper.imgdiff_difference_area_generator_mapper import Difference_Area_Generator_Mapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_pair_similarity_filter_args | Dict | No | Arguments for image pair similarity filter. Default: min/max_score_1/2, hf_clip="openai/clip-vit-base-patch32" |
| image_segment_mapper_args | Dict | No | Arguments for image segmentation. Default: imgsz=1024, conf=0.05, iou=0.5, model_path="FastSAM-x.pt" |
| image_text_matching_filter_args | Dict | No | Arguments for image-text matching. Default: min_score=0.1, max_score=1.0, hf_blip="Salesforce/blip-itm-base-coco" |
Sample Fields
| Name | Type | Required | Description |
|---|---|---|---|
| image_path1 | str | Yes | Path to the first image |
| image_path2 | str | Yes | Path to the second image |
| caption1 | str | Yes | Caption for the first image |
| caption2 | str | Yes | Caption for the second image |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[Fields.meta][MetaKeys.bbox_tag] | np.ndarray | Filtered bounding boxes (Nx4) for regions with detected differences. Returns zeros if no differences found. |
Usage Examples
# Basic usage
mapper = Difference_Area_Generator_Mapper()
# With custom similarity thresholds
mapper = Difference_Area_Generator_Mapper(
image_pair_similarity_filter_args={
"min_score_1": 0.2,
"max_score_1": 0.9,
"min_score_2": 0.1,
"max_score_2": 0.8,
},
image_segment_mapper_args={
"imgsz": 512,
"conf": 0.1,
},
)
# Process a sample
sample = {
"image_path1": "/path/to/image1.jpg",
"image_path2": "/path/to/image2.jpg",
"caption1": "A red car parked on the street",
"caption2": "A blue car parked on the street",
}
result = mapper.process_single(sample, rank=0)