Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:AUTOMATIC1111 Stable diffusion webui PersonalizedBase for textual inversion

From Leeroopedia


Knowledge Sources
Domains Textual Inversion, Dataset, Training Data, Stable Diffusion
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for constructing a PyTorch Dataset that pre-encodes images to latent space, applies caption templates with placeholder tokens, and supports tag shuffling and dropout for textual inversion training, provided by the AUTOMATIC1111 stable-diffusion-webui repository.

Description

PersonalizedBase is a PyTorch Dataset subclass that handles the full pipeline from raw image files to training-ready DatasetEntry objects. During initialization, it:

  1. Reads a template file containing prompt patterns with [name] and [filewords] placeholders
  2. Iterates over all images in data_root, loading and resizing them to the specified width x height (unless varsize=True)
  3. Optionally extracts alpha channels for per-pixel loss weighting
  4. Pre-encodes each image through the VAE encoder to obtain latent representations, using the chosen latent_sampling_method ("once", "deterministic", or "random")
  5. Reads per-image caption text from companion .txt files or derives it from filenames
  6. Groups images by resolution for variable-size batching via GroupedBatchSampler

At access time (__getitem__), it applies tag shuffling, tag dropout, and random latent resampling as configured.

Usage

Use this dataset class when:

  • Setting up the data pipeline for textual inversion embedding training
  • You need pre-encoded latents to reduce VRAM usage during training
  • You want to apply caption augmentation (tag shuffling, dropout) during training
  • Working with variable-resolution images that require aspect-ratio bucketing

Code Reference

Source Location

Signature

class PersonalizedBase(Dataset):
    def __init__(
        self,
        data_root,
        width,
        height,
        repeats,
        flip_p=0.5,
        placeholder_token="*",
        model=None,
        cond_model=None,
        device=None,
        template_file=None,
        include_cond=False,
        batch_size=1,
        gradient_step=1,
        shuffle_tags=False,
        tag_drop_out=0,
        latent_sampling_method='once',
        varsize=False,
        use_weight=False
    ):

Import

from modules.textual_inversion.dataset import PersonalizedBase

I/O Contract

Inputs

Name Type Required Description
data_root str Yes Path to directory containing training images (and optional companion .txt caption files)
width int Yes Target image width in pixels for resizing (ignored if varsize=True)
height int Yes Target image height in pixels for resizing (ignored if varsize=True)
repeats int Yes Number of times to repeat the dataset per epoch (stored but used by the training loop for epoch length calculation)
flip_p float No Probability of random horizontal flip augmentation; defaults to 0.5
placeholder_token str No The token name to substitute for [name] in templates; defaults to "*"
model object No The Stable Diffusion model, used for VAE encoding via encode_first_stage
cond_model object No The CLIP conditioning model, used for pre-computing text conditions when include_cond=True
device torch.device No Target device for tensor operations during encoding
template_file str No Path to the prompt template text file containing one template per line
include_cond bool No If True, pre-computes CLIP text embeddings during dataset construction; defaults to False
batch_size int No Batch size for training; clamped to dataset length; defaults to 1
gradient_step int No Gradient accumulation steps; clamped based on dataset and batch size; defaults to 1
shuffle_tags bool No If True, randomly shuffles comma-separated tags in captions at each access; defaults to False
tag_drop_out float No Probability of dropping each individual tag from captions; 0 means no dropout; defaults to 0
latent_sampling_method str No One of "once", "deterministic", or "random"; controls how latents are sampled from the VAE posterior; defaults to "once"
varsize bool No If True, preserves original image aspect ratios and groups by resolution; defaults to False
use_weight bool No If True, extracts alpha channels as per-pixel loss weights; defaults to False

Outputs

Name Type Description
entry DatasetEntry A DatasetEntry with fields: filename, filename_text, latent_sample (or latent_dist if random), cond_text, weight

Usage Examples

Basic Usage

from modules.textual_inversion.dataset import PersonalizedBase, PersonalizedDataLoader

ds = PersonalizedBase(
    data_root="/path/to/training/images",
    width=512,
    height=512,
    repeats=100,
    placeholder_token="my-concept",
    model=shared.sd_model,
    cond_model=shared.sd_model.cond_stage_model,
    device=devices.device,
    template_file="/path/to/template.txt",
    batch_size=2,
    gradient_step=1,
    shuffle_tags=True,
    tag_drop_out=0.1,
    latent_sampling_method="once"
)

dl = PersonalizedDataLoader(ds, latent_sampling_method="once", batch_size=ds.batch_size)

for batch in dl:
    print(batch.cond_text)       # list of caption strings
    print(batch.latent_sample)   # stacked latent tensors
    break

Variable-Size Bucketing

ds = PersonalizedBase(
    data_root="/path/to/variable_size_images",
    width=512,
    height=512,
    repeats=100,
    placeholder_token="my-style",
    model=shared.sd_model,
    cond_model=shared.sd_model.cond_stage_model,
    device=devices.device,
    template_file="/path/to/template.txt",
    varsize=True,  # preserve original aspect ratios
    batch_size=4
)
# Images are grouped into buckets by resolution; GroupedBatchSampler
# ensures each batch contains same-sized images

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment