Workflow:Microsoft LoRA NLU GLUE Finetuning

Knowledge Sources	Microsoft LoRA LoRA: Low-Rank Adaptation of Large Language Models GLUE Benchmark HuggingFace Transformers
Domains	LLMs, Natural_Language_Understanding, Fine_Tuning, Parameter_Efficient_Fine_Tuning
Last Updated	2026-02-10 05:30 GMT

Overview

End-to-end process for fine-tuning RoBERTa and DeBERTa V2 models with LoRA on the GLUE benchmark tasks for natural language understanding.

Description

This workflow covers the complete pipeline for adapting encoder-based transformer models (RoBERTa-base/large, DeBERTa V2 XXLarge) to GLUE benchmark tasks using Low-Rank Adaptation. It leverages a modified HuggingFace Transformers v4.4.2 fork where LoRA layers have been injected into the self-attention query and value projections. The workflow uses the HuggingFace Trainer API for training and evaluation. LoRA trains only 0.3M-4.7M parameters compared to 125M-1.5B for full fine-tuning, while achieving comparable or superior results across all 8 GLUE tasks (MNLI, SST-2, MRPC, CoLA, QNLI, QQP, RTE, STS-B).

Usage

Execute this workflow when you need to adapt a pretrained encoder model (RoBERTa or DeBERTa) for text classification, natural language inference, semantic similarity, or other sentence-level understanding tasks on the GLUE benchmark, while keeping parameter count and storage minimal. This is the reference implementation for reproducing the LoRA paper results on NLU tasks.

Execution Steps

Step 1: Environment Setup

Create a conda environment with all required dependencies using the provided environment.yml specification. Install both the loralib package and the modified HuggingFace Transformers fork that includes LoRA support in RoBERTa and DeBERTa models.

Key considerations:

Uses conda for reproducible environment management (environment.yml)
The loralib package is installed in editable mode from the parent directory
The modified Transformers fork is installed in editable mode from the NLU directory
Requires NVIDIA GPUs (experiments run on 4x Tesla V100)
CUBLAS_WORKSPACE_CONFIG and PYTHONHASHSEED are set for deterministic reproducibility

Step 2: Download GLUE Data

Download the GLUE benchmark datasets using the provided utility script. The GLUE benchmark includes 8 tasks spanning single-sentence classification, sentence-pair classification, and sentence-pair regression.

Key considerations:

The download_glue_data.py utility fetches all 8 GLUE task datasets
Tasks include: MNLI (multi-genre NLI), SST-2 (sentiment), MRPC (paraphrase), CoLA (acceptability), QNLI (question NLI), QQP (question pairs), RTE (textual entailment), STS-B (semantic similarity)
Some tasks (MRPC, RTE, STS-B) benefit from initializing with a LoRA-adapted MNLI checkpoint rather than training from scratch

Step 3: Configure LoRA Injection

The modified Transformers fork has LoRA layers pre-injected into the model architectures. Configuration is done via command-line flags that control whether LoRA is applied and its hyperparameters (rank, alpha, bias mode).

Key considerations:

The --apply_lora flag activates LoRA adapter injection in the model
--lora_r sets the rank (8 for RoBERTa, 16 for DeBERTa V2 XXLarge)
--lora_alpha sets the scaling factor (16 for RoBERTa, 32 for DeBERTa V2)
LoRA is injected into the query and value projections of self-attention layers
For RoBERTa: modifies modeling_roberta.py attention layers
For DeBERTa V2: modifies modeling_deberta_v2.py attention layers

Step 4: Distributed Training with HuggingFace Trainer

Launch distributed training using torch.distributed.launch with the run_glue.py script. The HuggingFace Trainer handles the training loop, evaluation, checkpointing, and logging.

Key considerations:

Uses 8 GPUs via torch.distributed.launch for DeBERTa experiments
RoBERTa-base: lr=5e-4, batch_size=16/gpu, 30 epochs, warmup_ratio=0.06, weight_decay=0.1
DeBERTa V2 XXLarge: lr=1e-4, batch_size=8/gpu, 5 epochs, warmup_steps=1000, fp16 enabled
Evaluation strategy varies: per-epoch for RoBERTa, every 500 steps for DeBERTa
The --use_deterministic_algorithms flag ensures reproducible results for DeBERTa
Optional data augmentation: Cutoff (--apply_cutoff) and R-Drop (--apply_rdrop) for MNLI

Step 5: Evaluate LoRA Checkpoint

Evaluate a saved LoRA checkpoint on the validation or test set by loading the base model and applying the LoRA adapter weights. The evaluation uses the same run_glue.py script with --do_eval flag.

Key considerations:

Load the base pretrained model (e.g., microsoft/deberta-v2-xxlarge) from HuggingFace Hub
Specify the LoRA checkpoint path via --lora_path
The evaluation script applies LoRA weights on top of the frozen base model
Metrics vary by task: accuracy for classification, Matthew's correlation for CoLA, Pearson/Spearman correlation for STS-B
Results are reported as median over 5 runs with confidence intervals

Step 6: Extract LoRA Weights

Optionally extract only the LoRA-specific weights from a full training checkpoint for minimal storage and distribution. The split_lora.py utility filters parameters to retain only those with "lora" in their names.

Key considerations:

The resulting LoRA checkpoint is typically 3-27 MB depending on the model size
Multiple task-specific LoRA checkpoints can be served from a single base model
The utils/convert.py utility can convert between LoRA weight naming conventions
The original pretrained model checkpoint from HuggingFace is still required for inference

Execution Diagram

GitHub URL

Workflow Repository