Principle:AUTOMATIC1111 Stable diffusion webui Learning rate scheduling
| Knowledge Sources | |
|---|---|
| Domains | Learning Rate, Training Optimization, Textual Inversion, Hyperparameter Scheduling |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Piecewise learning rate scheduling is a technique for adjusting the optimizer's learning rate at predefined training step boundaries, enabling coarse-to-fine optimization of embedding vectors during textual inversion.
Description
Learning rate scheduling is essential for training textual inversion embeddings because the optimization landscape changes as training progresses. Early in training, a higher learning rate allows the embedding to move quickly toward the correct region of CLIP embedding space. As training continues, a lower learning rate enables fine-grained refinement without overshooting.
A piecewise constant (step-based) schedule divides training into phases, each with a fixed learning rate. This is the simplest form of scheduling and is particularly well-suited to textual inversion because:
- Training runs are typically short (a few thousand steps)
- The practitioner can manually specify rates based on observed convergence behavior
- It is easy to specify, parse, and reason about compared to smooth decay functions
The schedule is defined as a comma-separated string of rate:step pairs, where each pair specifies the learning rate to use until a given step number is reached. A special step value of -1 means "until the end of training." If only a single number is provided without a step boundary, it is used as a constant rate for the entire run.
Usage
Use piecewise learning rate scheduling when:
- Training textual inversion embeddings and you want to reduce the learning rate as training progresses
- You have empirical knowledge of good learning rate phases for your concept type
- You want a simple, interpretable schedule without the complexity of cosine annealing or warm restarts
- You need the same schedule format for both the learning rate and gradient clipping value
Theoretical Basis
Piecewise Constant Schedule
A piecewise constant learning rate schedule is defined as:
lr(t) = lr_i for t_{i-1} <= t < t_i
where are the rate-step pairs, and .
For example, the schedule string "0.005:100, 0.0001:1000, 1e-5:10000" defines:
| Phase | Learning Rate | Step Range |
|---|---|---|
| 1 | 0.005 | 0 to 99 |
| 2 | 0.0001 | 100 to 999 |
| 3 | 1e-5 | 1000 to 9999 |
Warm-Up Strategy
A warm-up phase can be implemented by starting with a lower learning rate in the first segment:
"1e-5:10, 0.005:500, 0.001:2000, 1e-4:5000"
This starts with a very small rate for 10 steps (warm-up), ramps to 0.005, then decays through subsequent phases.
Decay Strategy
A monotonically decreasing schedule provides a natural decay:
"0.005:200, 0.001:1000, 5e-4:3000, 1e-4:5000"
Each phase uses a progressively smaller rate, allowing coarse adjustment early and fine-tuning later.
Resumption from Checkpoints
The schedule parser supports resuming from an arbitrary step by filtering out phases that have already completed. When cur_step is provided, only rate-step pairs where step > cur_step are retained. This allows seamless continuation of interrupted training runs.