Implementation:AUTOMATIC1111 Stable diffusion webui PersonalizedBase for hypernetwork
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Dataset Preparation, Stable Diffusion |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete dataset class for preparing training data for hypernetwork training in Stable Diffusion, provided by the AUTOMATIC1111 stable-diffusion-webui repository. PersonalizedBase is a PyTorch Dataset subclass that loads images, pre-encodes them into latent space via the VAE, and generates text conditioning from template files.
Description
In the hypernetwork training context, PersonalizedBase is instantiated with the hypernetwork name as the placeholder_token, uses the hypernetwork.txt template file for prompt generation, and sets include_cond=True to pre-compute conditioning embeddings. The dataset loads images from a specified directory, encodes them through the model's first-stage encoder (VAE) into latent representations, and pairs each latent with a randomly templated text prompt derived from associated text files or filenames.
The dataset supports three latent sampling strategies (once, deterministic, random), optional horizontal flip augmentation, tag shuffling, tag dropout, variable-size bucketing, and alpha-channel-based per-pixel loss weighting.
Usage
Import and instantiate PersonalizedBase when setting up the training data pipeline for hypernetwork training. It is called within train_hypernetwork() to prepare the dataset before the training loop begins.
Code Reference
Source Location
- Repository: stable-diffusion-webui
- File:
modules/textual_inversion/dataset.py - Lines: L32-173
Signature
class PersonalizedBase(Dataset):
def __init__(self, data_root, width, height, repeats, flip_p=0.5,
placeholder_token="*", model=None, cond_model=None,
device=None, template_file=None, include_cond=False,
batch_size=1, gradient_step=1, shuffle_tags=False,
tag_drop_out=0, latent_sampling_method='once',
varsize=False, use_weight=False):
Import
from modules.textual_inversion.dataset import PersonalizedBase
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_root | str | Yes | Path to the directory containing training images |
| width | int | Yes | Target image width for resizing (e.g., 512) |
| height | int | Yes | Target image height for resizing (e.g., 512) |
| repeats | int | Yes | Number of times to repeat the dataset per epoch (from shared.opts.training_image_repeats_per_epoch)
|
| flip_p | float | No | Probability of random horizontal flip augmentation (default: 0.5) |
| placeholder_token | str | No | Token for [name] substitution in templates; set to hypernetwork name during hypernetwork training (default: "*")
|
| model | object | No | The Stable Diffusion model, used for VAE encoding via model.encode_first_stage()
|
| cond_model | object | No | The conditioning model (CLIP), used for pre-computing text embeddings |
| device | torch.device | No | Device for tensor operations |
| template_file | str | No | Path to the prompt template file (e.g., hypernetwork.txt)
|
| include_cond | bool | No | Whether to pre-compute conditioning embeddings; True for hypernetwork training (default: False) |
| batch_size | int | No | Batch size, clamped to dataset length (default: 1) |
| gradient_step | int | No | Gradient accumulation steps (default: 1) |
| shuffle_tags | bool | No | Whether to randomly shuffle comma-separated tags in text (default: False) |
| tag_drop_out | float | No | Probability of dropping individual tags (default: 0) |
| latent_sampling_method | str | No | Latent sampling strategy: "once", "deterministic", or "random" (default: "once") |
| varsize | bool | No | Whether to use variable image sizes with bucketing (default: False) |
| use_weight | bool | No | Whether to use alpha-channel-based per-pixel loss weighting (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| __getitem__(i) | DatasetEntry | A single dataset entry containing latent_sample (Tensor), cond_text (str), cond (Tensor or None), and weight (Tensor or None)
|
| __len__() | int | Total number of entries in the dataset |
| self.batch_size | int | Effective batch size (clamped to dataset length) |
| self.gradient_step | int | Effective gradient accumulation steps |
Usage Examples
Basic Usage in Hypernetwork Training
import modules.textual_inversion.dataset as dataset
from modules import devices, shared
ds = dataset.PersonalizedBase(
data_root="/path/to/training/images",
width=512,
height=512,
repeats=shared.opts.training_image_repeats_per_epoch,
placeholder_token="my_hypernetwork",
model=shared.sd_model,
cond_model=shared.sd_model.cond_stage_model,
device=devices.device,
template_file="/path/to/textual_inversion_templates/hypernetwork.txt",
include_cond=True,
batch_size=4,
gradient_step=1,
shuffle_tags=False,
tag_drop_out=0,
latent_sampling_method="once",
varsize=False,
use_weight=False,
)
# Wrap in PersonalizedDataLoader for batched iteration
dl = dataset.PersonalizedDataLoader(
ds,
latent_sampling_method=ds.latent_sampling_method,
batch_size=ds.batch_size,
pin_memory=True,
)
for batch in dl:
latents = batch.latent_sample # shape: (batch_size, 4, H/8, W/8)
cond = batch.cond # list of pre-computed conditioning tensors
cond_text = batch.cond_text # list of prompt strings
break