Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Bigscience workshop Petals Optimizer Setup

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Optimization, Training
Last Updated 2026-02-09 14:00 GMT

Overview

Configuring the optimizer and learning rate scheduler for training only the locally-trainable parameters (prompt embeddings, classification head) in a distributed Petals model.

Description

Optimizer Setup configures the gradient descent optimization for prompt tuning with distributed models. The key constraint is that only local parameters are optimized — the remote transformer blocks are frozen.

Trainable parameters:

  • prompt_embeddings.weight (if ptune/deep_ptune enabled)
  • intermediate_prompt_embeddings.weight (if deep_ptune)
  • score.weight and score.bias (classification head, for classification tasks)

Optimizer choice: AdamW is used as it provides decoupled weight decay, which is important for the small number of trainable parameters in prompt tuning.

Learning rate: Typically higher than full fine-tuning (1e-3 to 1e-2) since only a few parameters are being optimized and the gradient signal passes through many frozen layers.

Scheduler: A linear warmup-then-decay schedule helps stabilize early training when gradients may be noisy from the distributed forward/backward.

Usage

Use this principle after loading a distributed model with prompt tuning enabled and before starting the training loop. Only include parameters where requires_grad=True.

Theoretical Basis

AdamW update rule:

mt=β1mt1+(1β1)gt vt=β2vt1+(1β2)gt2 θt=θt1α(m^tv^t+ϵ+λθt1)

where λ is the weight decay coefficient, applied separately from the adaptive step.

# Abstract optimizer setup for prompt tuning
trainable_params = [p for p in model.parameters() if p.requires_grad]
optimizer = AdamW(trainable_params, lr=1e-3, weight_decay=0.0)
scheduler = get_scheduler("linear", optimizer, num_warmup_steps=100, num_training_steps=1000)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment