Implementation:Datajuicer Data juicer ImageDiffusionMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for generating new images using Stable Diffusion for data augmentation provided by Data-Juicer.

Description

ImageDiffusionMapper is a mapper operator that generates new images using a HuggingFace diffusion model (default: Stable Diffusion v1.4) based on existing images and their captions. It supports image-to-image transformation with configurable strength (how much to deviate from reference image), guidance scale (text-prompt adherence), and the number of augmented images per sample. If no caption is provided, it can generate one using a BLIP2 model. Operates in batched mode with CUDA acceleration and requires approximately 8GB of GPU memory.

Usage

Use when you need to expand training datasets with realistic new images via diffusion-based generation, maintaining semantic consistency with captions while adding visual diversity.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/image_diffusion_mapper.py

Signature

@OPERATORS.register_module("image_diffusion_mapper")
class ImageDiffusionMapper(Mapper):
    def __init__(self,
                 hf_diffusion: str = "CompVis/stable-diffusion-v1-4",
                 trust_remote_code: bool = False,
                 torch_dtype: str = "fp32",
                 revision: str = "main",
                 strength: float = 0.8,
                 guidance_scale: float = 7.5,
                 aug_num: PositiveInt = 1,
                 keep_original_sample: bool = True,
                 caption_key: Optional[str] = None,
                 hf_img2seq: str = "Salesforce/blip2-opt-2.7b",
                 save_dir: str = None,
                 *args, **kwargs):

Import

from data_juicer.ops.mapper.image_diffusion_mapper import ImageDiffusionMapper

I/O Contract

Inputs

Name	Type	Required	Description
hf_diffusion	str	No	Diffusion model name on HuggingFace, defaults to "CompVis/stable-diffusion-v1-4"
trust_remote_code	bool	No	Whether to trust remote code of HF models, defaults to False
torch_dtype	str	No	Floating point type for model: fp32, fp16, or bf16; defaults to "fp32"
revision	str	No	Specific model version (branch, tag, or commit id), defaults to "main"
strength	float	No	Extent to transform reference image (0 to 1), defaults to 0.8
guidance_scale	float	No	How closely generated images match text prompt, defaults to 7.5
aug_num	PositiveInt	No	Number of augmented images per sample, defaults to 1
keep_original_sample	bool	No	Whether to keep original sample, defaults to True
caption_key	Optional[str]	No	Key name in samples for captions; if None, captions are auto-generated
hf_img2seq	str	No	HuggingFace model for caption generation, defaults to "Salesforce/blip2-opt-2.7b"
save_dir	str	No	Directory to store generated images; if not specified, saves in same directory as input

Outputs

Name	Type	Description
samples	Dict	Transformed samples with generated image paths and augmented entries

Usage Examples

process:
  - image_diffusion_mapper:
      hf_diffusion: "CompVis/stable-diffusion-v1-4"
      strength: 0.8
      guidance_scale: 7.5
      aug_num: 1
      keep_original_sample: true

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment