Implementation:Datajuicer Data juicer SDXLPrompt2PromptMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for generating paired images using SDXL diffusion models provided by Data-Juicer.

Description

SDXLPrompt2PromptMapper is a mapper operator that generates pairs of similar images from two text prompts using the Stable Diffusion XL (SDXL) model with a Prompt2Prompt pipeline. It takes two text prompts from text_key and text_key_second, and generates corresponding paired images controlled by num_inference_steps and guidance_scale parameters. Generated images are saved with unique timestamped filenames to a configurable output directory. Requires CUDA acceleration and both text keys to be set for processing.

Usage

Use when you need to generate paired image data for training image editing and style transfer models using diffusion-based generation within the data pipeline.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/sdxl_prompt2prompt_mapper.py

Signature

@OPERATORS.register_module("sdxl_prompt2prompt_mapper")
class SDXLPrompt2PromptMapper(Mapper):
    def __init__(
        self,
        hf_diffusion: str = "stabilityai/stable-diffusion-xl-base-1.0",
        trust_remote_code=False,
        torch_dtype: str = "fp32",
        num_inference_steps: float = 50,
        guidance_scale: float = 7.5,
        text_key=None,
        text_key_second=None,
        output_dir=DATA_JUICER_ASSETS_CACHE,
        *args,
        **kwargs,
    ):

Import

from data_juicer.ops.mapper.sdxl_prompt2prompt_mapper import SDXLPrompt2PromptMapper

I/O Contract

Inputs

Name	Type	Required	Description
hf_diffusion	str	No	Diffusion model name on HuggingFace (default: stabilityai/stable-diffusion-xl-base-1.0)
trust_remote_code	bool	No	Whether to trust remote code of HF models (default: False)
torch_dtype	str	No	Floating point type for loading the model (default: fp32)
num_inference_steps	float	No	Number of inference steps; higher values improve quality (default: 50)
guidance_scale	float	No	Guidance scale for text-image alignment (default: 7.5)
text_key	str	Yes	Key name for the first caption in the pair
text_key_second	str	Yes	Key name for the second caption in the pair
output_dir	str	No	Storage location for generated images (default: DATA_JUICER_ASSETS_CACHE)

Outputs

Name	Type	Description
sample[image_path1]	str	Absolute path to the first generated image
sample[image_path2]	str	Absolute path to the second generated image

Usage Examples

process:
  - sdxl_prompt2prompt_mapper:
      hf_diffusion: 'stabilityai/stable-diffusion-xl-base-1.0'
      num_inference_steps: 50
      guidance_scale: 7.5
      text_key: 'caption1'
      text_key_second: 'caption2'
      output_dir: '/path/to/output'

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment