Implementation:Datajuicer Data juicer ImageDiffusionMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for generating new images using Stable Diffusion for data augmentation provided by Data-Juicer.
Description
ImageDiffusionMapper is a mapper operator that generates new images using a HuggingFace diffusion model (default: Stable Diffusion v1.4) based on existing images and their captions. It supports image-to-image transformation with configurable strength (how much to deviate from reference image), guidance scale (text-prompt adherence), and the number of augmented images per sample. If no caption is provided, it can generate one using a BLIP2 model. Operates in batched mode with CUDA acceleration and requires approximately 8GB of GPU memory.
Usage
Use when you need to expand training datasets with realistic new images via diffusion-based generation, maintaining semantic consistency with captions while adding visual diversity.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/image_diffusion_mapper.py
Signature
@OPERATORS.register_module("image_diffusion_mapper")
class ImageDiffusionMapper(Mapper):
def __init__(self,
hf_diffusion: str = "CompVis/stable-diffusion-v1-4",
trust_remote_code: bool = False,
torch_dtype: str = "fp32",
revision: str = "main",
strength: float = 0.8,
guidance_scale: float = 7.5,
aug_num: PositiveInt = 1,
keep_original_sample: bool = True,
caption_key: Optional[str] = None,
hf_img2seq: str = "Salesforce/blip2-opt-2.7b",
save_dir: str = None,
*args, **kwargs):
Import
from data_juicer.ops.mapper.image_diffusion_mapper import ImageDiffusionMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hf_diffusion | str | No | Diffusion model name on HuggingFace, defaults to "CompVis/stable-diffusion-v1-4" |
| trust_remote_code | bool | No | Whether to trust remote code of HF models, defaults to False |
| torch_dtype | str | No | Floating point type for model: fp32, fp16, or bf16; defaults to "fp32" |
| revision | str | No | Specific model version (branch, tag, or commit id), defaults to "main" |
| strength | float | No | Extent to transform reference image (0 to 1), defaults to 0.8 |
| guidance_scale | float | No | How closely generated images match text prompt, defaults to 7.5 |
| aug_num | PositiveInt | No | Number of augmented images per sample, defaults to 1 |
| keep_original_sample | bool | No | Whether to keep original sample, defaults to True |
| caption_key | Optional[str] | No | Key name in samples for captions; if None, captions are auto-generated |
| hf_img2seq | str | No | HuggingFace model for caption generation, defaults to "Salesforce/blip2-opt-2.7b" |
| save_dir | str | No | Directory to store generated images; if not specified, saves in same directory as input |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with generated image paths and augmented entries |
Usage Examples
process:
- image_diffusion_mapper:
hf_diffusion: "CompVis/stable-diffusion-v1-4"
strength: 0.8
guidance_scale: 7.5
aug_num: 1
keep_original_sample: true