Workflow:Microsoft LoRA NLU GLUE Finetuning
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Natural_Language_Understanding, Fine_Tuning, Parameter_Efficient_Fine_Tuning |
| Last Updated | 2026-02-10 05:30 GMT |
Overview
End-to-end process for fine-tuning RoBERTa and DeBERTa V2 models with LoRA on the GLUE benchmark tasks for natural language understanding.
Description
This workflow covers the complete pipeline for adapting encoder-based transformer models (RoBERTa-base/large, DeBERTa V2 XXLarge) to GLUE benchmark tasks using Low-Rank Adaptation. It leverages a modified HuggingFace Transformers v4.4.2 fork where LoRA layers have been injected into the self-attention query and value projections. The workflow uses the HuggingFace Trainer API for training and evaluation. LoRA trains only 0.3M-4.7M parameters compared to 125M-1.5B for full fine-tuning, while achieving comparable or superior results across all 8 GLUE tasks (MNLI, SST-2, MRPC, CoLA, QNLI, QQP, RTE, STS-B).
Usage
Execute this workflow when you need to adapt a pretrained encoder model (RoBERTa or DeBERTa) for text classification, natural language inference, semantic similarity, or other sentence-level understanding tasks on the GLUE benchmark, while keeping parameter count and storage minimal. This is the reference implementation for reproducing the LoRA paper results on NLU tasks.
Execution Steps
Step 1: Environment Setup
Create a conda environment with all required dependencies using the provided environment.yml specification. Install both the loralib package and the modified HuggingFace Transformers fork that includes LoRA support in RoBERTa and DeBERTa models.
Key considerations:
- Uses conda for reproducible environment management (environment.yml)
- The loralib package is installed in editable mode from the parent directory
- The modified Transformers fork is installed in editable mode from the NLU directory
- Requires NVIDIA GPUs (experiments run on 4x Tesla V100)
- CUBLAS_WORKSPACE_CONFIG and PYTHONHASHSEED are set for deterministic reproducibility
Step 2: Download GLUE Data
Download the GLUE benchmark datasets using the provided utility script. The GLUE benchmark includes 8 tasks spanning single-sentence classification, sentence-pair classification, and sentence-pair regression.
Key considerations:
- The download_glue_data.py utility fetches all 8 GLUE task datasets
- Tasks include: MNLI (multi-genre NLI), SST-2 (sentiment), MRPC (paraphrase), CoLA (acceptability), QNLI (question NLI), QQP (question pairs), RTE (textual entailment), STS-B (semantic similarity)
- Some tasks (MRPC, RTE, STS-B) benefit from initializing with a LoRA-adapted MNLI checkpoint rather than training from scratch
Step 3: Configure LoRA Injection
The modified Transformers fork has LoRA layers pre-injected into the model architectures. Configuration is done via command-line flags that control whether LoRA is applied and its hyperparameters (rank, alpha, bias mode).
Key considerations:
- The --apply_lora flag activates LoRA adapter injection in the model
- --lora_r sets the rank (8 for RoBERTa, 16 for DeBERTa V2 XXLarge)
- --lora_alpha sets the scaling factor (16 for RoBERTa, 32 for DeBERTa V2)
- LoRA is injected into the query and value projections of self-attention layers
- For RoBERTa: modifies modeling_roberta.py attention layers
- For DeBERTa V2: modifies modeling_deberta_v2.py attention layers
Step 4: Distributed Training with HuggingFace Trainer
Launch distributed training using torch.distributed.launch with the run_glue.py script. The HuggingFace Trainer handles the training loop, evaluation, checkpointing, and logging.
Key considerations:
- Uses 8 GPUs via torch.distributed.launch for DeBERTa experiments
- RoBERTa-base: lr=5e-4, batch_size=16/gpu, 30 epochs, warmup_ratio=0.06, weight_decay=0.1
- DeBERTa V2 XXLarge: lr=1e-4, batch_size=8/gpu, 5 epochs, warmup_steps=1000, fp16 enabled
- Evaluation strategy varies: per-epoch for RoBERTa, every 500 steps for DeBERTa
- The --use_deterministic_algorithms flag ensures reproducible results for DeBERTa
- Optional data augmentation: Cutoff (--apply_cutoff) and R-Drop (--apply_rdrop) for MNLI
Step 5: Evaluate LoRA Checkpoint
Evaluate a saved LoRA checkpoint on the validation or test set by loading the base model and applying the LoRA adapter weights. The evaluation uses the same run_glue.py script with --do_eval flag.
Key considerations:
- Load the base pretrained model (e.g., microsoft/deberta-v2-xxlarge) from HuggingFace Hub
- Specify the LoRA checkpoint path via --lora_path
- The evaluation script applies LoRA weights on top of the frozen base model
- Metrics vary by task: accuracy for classification, Matthew's correlation for CoLA, Pearson/Spearman correlation for STS-B
- Results are reported as median over 5 runs with confidence intervals
Step 6: Extract LoRA Weights
Optionally extract only the LoRA-specific weights from a full training checkpoint for minimal storage and distribution. The split_lora.py utility filters parameters to retain only those with "lora" in their names.
Key considerations:
- The resulting LoRA checkpoint is typically 3-27 MB depending on the model size
- Multiple task-specific LoRA checkpoints can be served from a single base model
- The utils/convert.py utility can convert between LoRA weight naming conventions
- The original pretrained model checkpoint from HuggingFace is still required for inference