Environment:Microsoft LoRA NLU Conda Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, NLU |
| Last Updated | 2026-02-10 05:30 GMT |
Overview
Conda environment with Python 3.7, PyTorch 1.9, CUDA 11.1, and a modified HuggingFace Transformers fork for LoRA-based NLU fine-tuning on GLUE tasks.
Description
The NLU example uses a Conda environment (defined in `environment.yml`) with a modified fork of HuggingFace Transformers (>= 4.4.0). The fork adds LoRA support directly into RoBERTa and DeBERTa v2 model architectures via `loralib.MergedLinear` layers in their attention modules. The environment includes DeepSpeed for distributed training, the HuggingFace Datasets library for GLUE data loading, and Accelerate for multi-GPU orchestration. The Transformers fork is installed in editable mode (`pip install -e .`).
Usage
Use this environment for the NLU GLUE Finetuning workflow. It is required to run `run_glue.py` which fine-tunes RoBERTa-base, RoBERTa-large, or DeBERTa V2 XXLarge on GLUE benchmark tasks with LoRA adaptation.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Required for CUDA, NCCL, and DeepSpeed |
| Hardware | NVIDIA GPU with CUDA 11.1 support | cudatoolkit=11.1.74 specified in environment.yml |
| Hardware | 8 GPUs recommended | Training scripts default to `num_gpus=8` |
| Python | 3.7.10 | Pinned in environment.yml |
| Disk | ~10GB | For conda environment, models, and GLUE data |
Dependencies
Conda Packages
- `python` = 3.7.10
- `pytorch` = 1.9.0 (py3.7_cuda11.1_cudnn8.0.5_0)
- `cudatoolkit` = 11.1.74
- `torchvision` = 0.10.0
- `torchaudio` = 0.9.0
- `numpy` = 1.20.2
Pip Packages
- `loralib` == 0.1.1
- `accelerate` == 0.3.0
- `datasets` == 1.9.0
- `deepspeed` == 0.5.0
- `scikit-learn` == 0.24.2
- `scipy` == 1.7.0
- `sentencepiece` == 0.1.96
- `tokenizers` == 0.10.3
- `triton` == 0.4.2
- `azureml-core` == 1.32.0
Modified Transformers
- HuggingFace Transformers >= 4.4.0 (forked, installed via `pip install -e .`)
Credentials
No credentials required for local training. The GLUE data is downloaded via public URLs. Pre-trained model checkpoints (RoBERTa, DeBERTa) are downloaded from the public Hugging Face model hub.
Quick Install
# Create the conda environment from specification
cd examples/NLU
conda env create -f environment.yml
# Activate the environment
conda activate NLU
# Install the modified Transformers fork in editable mode
pip install -e .
# Download GLUE data
python utils/download_glue_data.py --data_dir glue_data --tasks all
Code Evidence
Minimum Transformers version check from `examples/NLU/examples/text-classification/run_glue.py:49`:
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")
CUBLAS reproducibility environment variable from `examples/NLU/roberta_base_mnli.sh:2`:
export CUBLAS_WORKSPACE_CONFIG=":16:8" # https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
export PYTHONHASHSEED=0
Conda environment header from `examples/NLU/environment.yml:1-6`:
name: NLU
channels:
- pytorch
- nvidia
- defaults
dependencies:
loralib pinned in environment.yml `examples/NLU/environment.yml:106`:
- loralib==0.1.1
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ImportError: transformers >= 4.4.0 required` | Transformers fork not installed or wrong version | Run `pip install -e .` from `examples/NLU/` directory |
| `ModuleNotFoundError: No module named 'loralib'` | loralib not installed in conda env | `pip install loralib==0.1.1` |
| `CUDA error: no kernel image is available` | CUDA toolkit version mismatch with GPU | Ensure cudatoolkit=11.1 matches your GPU driver |
Compatibility Notes
- Conda Required: Unlike the NLG example (pip-only), the NLU example requires a Conda environment due to complex dependency pinning.
- Modified Transformers: The NLU example uses a forked HuggingFace Transformers with LoRA injected into RoBERTa (`modeling_roberta.py`) and DeBERTa v2 (`modeling_deberta_v2.py`). Standard transformers will not work.
- Deterministic Training: Scripts set `CUBLAS_WORKSPACE_CONFIG`, `PYTHONHASHSEED=0`, and `--use_deterministic_algorithms` for reproducibility.
- DeepSpeed: `ds_config.json` provides ZeRO Stage 2 configuration for memory-efficient distributed training.