Principle:Microsoft LoRA NLU Environment Setup
Overview
NLU Environment Setup describes the process of preparing a reproducible conda environment for running LoRA-based fine-tuning experiments on GLUE benchmark tasks. The environment is built around a modified fork of HuggingFace Transformers v4.4.2 that injects LoRA support directly into the RoBERTa and DeBERTa V2 model architectures.
The LoRA approach (Low-Rank Adaptation of Large Language Models, Hu et al., 2021; arXiv:2106.09685) requires architectural modifications to the self-attention layers of pretrained transformer models. Rather than using an external adapter library at runtime, the microsoft/LoRA repository ships a forked copy of Transformers where loralib.Linear layers have been patched directly into the query and value projections of each attention head.
Why a Modified Fork
Standard HuggingFace Transformers does not natively support LoRA. The microsoft/LoRA NLU example addresses this by:
- Forking the entire HuggingFace Transformers v4.4.2 source tree into
examples/NLU/ - Modifying
modeling_roberta.pyandmodeling_deberta_v2.pyto conditionally replacenn.Linearwithlora.Linearin the self-attention query and value projections - Adding LoRA-specific configuration flags (
apply_lora,lora_r,lora_alpha) to the model config classes - Installing the fork in editable mode so that
import transformersresolves to the modified code
This approach ensures that LoRA integration is transparent to the rest of the HuggingFace training infrastructure (Trainer, data collators, evaluation loops).
Conda Environment
The environment is specified in examples/NLU/environment.yml (lines 1-107). Key dependencies include:
- Python 3.7.10 -- pinned for reproducibility
- PyTorch 1.9.0 with CUDA 11.1 and cuDNN 8.0.5
- loralib 0.1.1 -- the core LoRA linear layer implementation
- datasets 1.9.0 -- HuggingFace Datasets for GLUE task loading
- tokenizers 0.10.3 -- fast tokenizer backend
- scikit-learn 0.24.2 -- for evaluation metrics
- deepspeed 0.5.0 -- optional distributed training backend
- accelerate 0.3.0 -- HuggingFace distributed training utilities
- tensorboardx 1.8 -- logging and visualization
The environment is configured to install into the conda prefix /opt/conda/envs/transformers.
Modified Transformers Package
The examples/NLU/setup.py (lines 1-309) defines the modified Transformers package. It is based on Transformers v4.4.2 and retains all original dependencies:
filelock-- filesystem locks for parallel downloadsnumpy >= 1.17regex-- for OpenAI GPT tokenizerrequests-- for downloading pretrained modelssacremoses-- for XLM tokenizertokenizers >= 0.10.1, < 0.11tqdm >= 4.27-- progress bars
The package is installed in editable mode (pip install -e .) so that local modifications to model files take effect immediately without reinstallation.
LoRA Injection Points
The modified fork patches two model architectures:
RoBERTa
In src/transformers/models/roberta/modeling_roberta.py, the RobertaSelfAttention class conditionally uses lora.Linear:
import loralib as lora
# In RobertaSelfAttention.__init__:
if config.apply_lora:
self.query = lora.Linear(config.hidden_size, self.all_head_size,
config.lora_r, lora_alpha=config.lora_alpha)
else:
self.query = nn.Linear(config.hidden_size, self.all_head_size)
if config.apply_lora:
self.value = lora.Linear(config.hidden_size, self.all_head_size,
config.lora_r, lora_alpha=config.lora_alpha)
else:
self.value = nn.Linear(config.hidden_size, self.all_head_size)
DeBERTa V2
In src/transformers/models/deberta_v2/modeling_deberta_v2.py, július DisentangledSelfAttention class applies the same pattern to query_proj and value_proj, with the additional flag merge_weights=False to keep LoRA weights separate during training.
Metadata
| Field | Value |
|---|---|
| Source | Repo (microsoft/LoRA), Doc (HuggingFace Transformers v4.4.2) |
| Domains | Setup, NLU |
| Related | Implementation:Microsoft_LoRA_NLU_Environment_Setup_Script |