Principle:AUTOMATIC1111 Stable diffusion webui Learning rate scheduling

Knowledge Sources	Cyclical Learning Rates for Training Neural Networks An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Domains	Learning Rate, Training Optimization, Textual Inversion, Hyperparameter Scheduling
Last Updated	2026-02-08 00:00 GMT

Overview

Piecewise learning rate scheduling is a technique for adjusting the optimizer's learning rate at predefined training step boundaries, enabling coarse-to-fine optimization of embedding vectors during textual inversion.

Description

Learning rate scheduling is essential for training textual inversion embeddings because the optimization landscape changes as training progresses. Early in training, a higher learning rate allows the embedding to move quickly toward the correct region of CLIP embedding space. As training continues, a lower learning rate enables fine-grained refinement without overshooting.

A piecewise constant (step-based) schedule divides training into phases, each with a fixed learning rate. This is the simplest form of scheduling and is particularly well-suited to textual inversion because:

Training runs are typically short (a few thousand steps)
The practitioner can manually specify rates based on observed convergence behavior
It is easy to specify, parse, and reason about compared to smooth decay functions

The schedule is defined as a comma-separated string of rate:step pairs, where each pair specifies the learning rate to use until a given step number is reached. A special step value of -1 means "until the end of training." If only a single number is provided without a step boundary, it is used as a constant rate for the entire run.

Usage

Use piecewise learning rate scheduling when:

Training textual inversion embeddings and you want to reduce the learning rate as training progresses
You have empirical knowledge of good learning rate phases for your concept type
You want a simple, interpretable schedule without the complexity of cosine annealing or warm restarts
You need the same schedule format for both the learning rate and gradient clipping value

Theoretical Basis

Piecewise Constant Schedule

A piecewise constant learning rate schedule is defined as:

lr(t) = lr_i   for t_{i-1} <= t < t_i

where $(l r_{1}, t_{1}), (l r_{2}, t_{2}), \dots, (l r_{n}, t_{n})$ are the rate-step pairs, and $t_{0} = 0$ .

For example, the schedule string "0.005:100, 0.0001:1000, 1e-5:10000" defines:

Phase	Learning Rate	Step Range
1	0.005	0 to 99
2	0.0001	100 to 999
3	1e-5	1000 to 9999

Warm-Up Strategy

A warm-up phase can be implemented by starting with a lower learning rate in the first segment:

"1e-5:10, 0.005:500, 0.001:2000, 1e-4:5000"

This starts with a very small rate for 10 steps (warm-up), ramps to 0.005, then decays through subsequent phases.

Decay Strategy

A monotonically decreasing schedule provides a natural decay:

"0.005:200, 0.001:1000, 5e-4:3000, 1e-4:5000"

Each phase uses a progressively smaller rate, allowing coarse adjustment early and fine-tuning later.

Resumption from Checkpoints

The schedule parser supports resuming from an arbitrary step by filtering out phases that have already completed. When cur_step is provided, only rate-step pairs where step > cur_step are retained. This allows seamless continuation of interrupted training runs.

Related Pages

Implemented By

Implementation:AUTOMATIC1111_Stable_diffusion_webui_LearnRateScheduler

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment