Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:AUTOMATIC1111 Stable diffusion webui Learning rate scheduling

From Leeroopedia


Knowledge Sources
Domains Learning Rate, Training Optimization, Textual Inversion, Hyperparameter Scheduling
Last Updated 2026-02-08 00:00 GMT

Overview

Piecewise learning rate scheduling is a technique for adjusting the optimizer's learning rate at predefined training step boundaries, enabling coarse-to-fine optimization of embedding vectors during textual inversion.

Description

Learning rate scheduling is essential for training textual inversion embeddings because the optimization landscape changes as training progresses. Early in training, a higher learning rate allows the embedding to move quickly toward the correct region of CLIP embedding space. As training continues, a lower learning rate enables fine-grained refinement without overshooting.

A piecewise constant (step-based) schedule divides training into phases, each with a fixed learning rate. This is the simplest form of scheduling and is particularly well-suited to textual inversion because:

  • Training runs are typically short (a few thousand steps)
  • The practitioner can manually specify rates based on observed convergence behavior
  • It is easy to specify, parse, and reason about compared to smooth decay functions

The schedule is defined as a comma-separated string of rate:step pairs, where each pair specifies the learning rate to use until a given step number is reached. A special step value of -1 means "until the end of training." If only a single number is provided without a step boundary, it is used as a constant rate for the entire run.

Usage

Use piecewise learning rate scheduling when:

  • Training textual inversion embeddings and you want to reduce the learning rate as training progresses
  • You have empirical knowledge of good learning rate phases for your concept type
  • You want a simple, interpretable schedule without the complexity of cosine annealing or warm restarts
  • You need the same schedule format for both the learning rate and gradient clipping value

Theoretical Basis

Piecewise Constant Schedule

A piecewise constant learning rate schedule is defined as:

lr(t) = lr_i   for t_{i-1} <= t < t_i

where (lr1,t1),(lr2,t2),,(lrn,tn) are the rate-step pairs, and t0=0.

For example, the schedule string "0.005:100, 0.0001:1000, 1e-5:10000" defines:

Phase Learning Rate Step Range
1 0.005 0 to 99
2 0.0001 100 to 999
3 1e-5 1000 to 9999

Warm-Up Strategy

A warm-up phase can be implemented by starting with a lower learning rate in the first segment:

"1e-5:10, 0.005:500, 0.001:2000, 1e-4:5000"

This starts with a very small rate for 10 steps (warm-up), ramps to 0.005, then decays through subsequent phases.

Decay Strategy

A monotonically decreasing schedule provides a natural decay:

"0.005:200, 0.001:1000, 5e-4:3000, 1e-4:5000"

Each phase uses a progressively smaller rate, allowing coarse adjustment early and fine-tuning later.

Resumption from Checkpoints

The schedule parser supports resuming from an arbitrary step by filtering out phases that have already completed. When cur_step is provided, only rate-step pairs where step > cur_step are retained. This allows seamless continuation of interrupted training runs.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment