Implementation:Datajuicer Data juicer SDXLPrompt2PromptMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for generating paired images using SDXL diffusion models provided by Data-Juicer.
Description
SDXLPrompt2PromptMapper is a mapper operator that generates pairs of similar images from two text prompts using the Stable Diffusion XL (SDXL) model with a Prompt2Prompt pipeline. It takes two text prompts from text_key and text_key_second, and generates corresponding paired images controlled by num_inference_steps and guidance_scale parameters. Generated images are saved with unique timestamped filenames to a configurable output directory. Requires CUDA acceleration and both text keys to be set for processing.
Usage
Use when you need to generate paired image data for training image editing and style transfer models using diffusion-based generation within the data pipeline.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/sdxl_prompt2prompt_mapper.py
Signature
@OPERATORS.register_module("sdxl_prompt2prompt_mapper")
class SDXLPrompt2PromptMapper(Mapper):
def __init__(
self,
hf_diffusion: str = "stabilityai/stable-diffusion-xl-base-1.0",
trust_remote_code=False,
torch_dtype: str = "fp32",
num_inference_steps: float = 50,
guidance_scale: float = 7.5,
text_key=None,
text_key_second=None,
output_dir=DATA_JUICER_ASSETS_CACHE,
*args,
**kwargs,
):
Import
from data_juicer.ops.mapper.sdxl_prompt2prompt_mapper import SDXLPrompt2PromptMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hf_diffusion | str | No | Diffusion model name on HuggingFace (default: stabilityai/stable-diffusion-xl-base-1.0) |
| trust_remote_code | bool | No | Whether to trust remote code of HF models (default: False) |
| torch_dtype | str | No | Floating point type for loading the model (default: fp32) |
| num_inference_steps | float | No | Number of inference steps; higher values improve quality (default: 50) |
| guidance_scale | float | No | Guidance scale for text-image alignment (default: 7.5) |
| text_key | str | Yes | Key name for the first caption in the pair |
| text_key_second | str | Yes | Key name for the second caption in the pair |
| output_dir | str | No | Storage location for generated images (default: DATA_JUICER_ASSETS_CACHE) |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[image_path1] | str | Absolute path to the first generated image |
| sample[image_path2] | str | Absolute path to the second generated image |
Usage Examples
process:
- sdxl_prompt2prompt_mapper:
hf_diffusion: 'stabilityai/stable-diffusion-xl-base-1.0'
num_inference_steps: 50
guidance_scale: 7.5
text_key: 'caption1'
text_key_second: 'caption2'
output_dir: '/path/to/output'