Principle:AUTOMATIC1111 Stable diffusion webui Txt2img parameter configuration
| Knowledge Sources | |
|---|---|
| Domains | Diffusion Models, Image Generation, Parameter Tuning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Text-to-image parameter configuration is the discipline of selecting and combining the generation hyperparameters that govern how a latent diffusion model transforms a text prompt into a final image.
Description
Generating an image from text with a Stable Diffusion model requires specifying a set of interrelated parameters that control image quality, composition, reproducibility, and computational cost. The key parameters and their roles are:
- Prompt and Negative Prompt -- The positive prompt describes what to generate; the negative prompt describes what to avoid. Both are encoded into conditioning tensors that guide the denoising process in opposite directions.
- CFG Scale (Classifier-Free Guidance Scale) -- Controls how strongly the model follows the prompt versus generating freely. Higher values (7-15) produce images that match the prompt more closely but may introduce artifacts. Lower values (1-5) produce softer, more creative results. Typical default is 7.0.
- Sampling Steps -- The number of denoising iterations. More steps generally produce higher quality but take longer. Common ranges are 20-50 steps depending on the sampler.
- Width and Height -- The pixel dimensions of the output image. Must be multiples of 8 (due to the VAE downsampling factor of 8). Standard Stable Diffusion v1.x models are trained at 512x512; generating far outside this range without hires fix can cause compositional artifacts.
- Seed -- An integer that initializes the random noise tensor. The same seed with identical parameters produces the same image, enabling reproducibility. A value of -1 selects a random seed.
- Subseed and Subseed Strength -- Enable smooth interpolation between different noise patterns using spherical linear interpolation (slerp), allowing subtle variation exploration.
- Sampler and Scheduler -- The sampling algorithm (Euler, DPM++, etc.) and noise schedule (Karras, exponential, uniform, Beta) that define how noise is removed at each step.
Usage
Parameter configuration is used every time a text-to-image generation is initiated. Understanding parameter interactions is essential for:
- Achieving consistent, reproducible results across sessions
- Balancing generation quality against computational time
- Avoiding common artifacts (e.g., anatomical distortions at wrong aspect ratios)
- Enabling advanced workflows like hires fix, which requires additional parameters
Theoretical Basis
Classifier-Free Guidance
The CFG scale parameter implements the guidance formula from Ho & Salimans (2022):
noise_pred = noise_uncond + cfg_scale * (noise_cond - noise_uncond)
where noise_cond is the model's prediction conditioned on the prompt and noise_uncond is the unconditional (negative prompt) prediction. When cfg_scale = 1, the output equals the conditional prediction alone. As cfg_scale increases, the model more aggressively steers toward the prompt.
Noise Schedules
The scheduler defines the sequence of noise levels (sigmas) across the sampling steps:
- Uniform -- linearly spaced noise levels from the model's default schedule
- Karras -- a schedule proposed by Karras et al. that concentrates steps at lower noise levels, often producing sharper results
- Exponential -- exponentially spaced noise levels
- Beta -- uses a beta distribution for scheduling, controlled by alpha and beta parameters
Seed and Reproducibility
The seed initializes a deterministic random number generator that produces the initial noise tensor of shape (C, H/8, W/8) where C=4 (latent channels), and H, W are image dimensions. Given identical parameters and model weights, the same seed always produces the same latent noise and thus the same final image.