Principle:Norrrrrrr lyn WAInjectBench Supervised Training Loop
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Optimization |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A supervised training loop that fine-tunes model parameters using AdamW optimization, cross-entropy loss, mixed-precision training, gradient clipping, and cosine learning rate scheduling with warmup.
Description
The training loop iterates over epochs, processing batches of image-label pairs through the model to compute binary cross-entropy loss between the predicted Yes/No logits and ground-truth labels. Key components include:
- AdamW optimizer: Adam with decoupled weight decay regularization
- Mixed-precision training (AMP): Uses
torch.amp.autocastandGradScalerfor memory-efficient training in fp16/bf16 - Gradient clipping: Limits gradient norms to prevent instability
- Cosine schedule with warmup: Linear warmup followed by cosine annealing to zero
- NaN/Inf fallback: Automatic fallback to FP32 with learning rate backoff when numerical instability is detected
Usage
Use this for fine-tuning the LLaVA model after LoRA injection and device placement. It is the core optimization step that updates the LoRA adapter weights.
Theoretical Basis
AdamW update rule:
Cosine schedule with warmup:
Mixed-precision training:
# Forward pass in reduced precision, backward in full
with autocast(dtype=amp_dtype):
loss = criterion(model(inputs), labels)
scaler.scale(loss).backward()
scaler.unscale_(optimizer)
clip_grad_norm_(model.parameters(), max_norm)
scaler.step(optimizer)
scaler.update()