Environment:Microsoft LoRA NLU Conda Environment

Knowledge Sources	Microsoft LoRA NLU HuggingFace Transformers
Domains	Infrastructure, NLU
Last Updated	2026-02-10 05:30 GMT

Overview

Conda environment with Python 3.7, PyTorch 1.9, CUDA 11.1, and a modified HuggingFace Transformers fork for LoRA-based NLU fine-tuning on GLUE tasks.

Description

The NLU example uses a Conda environment (defined in `environment.yml`) with a modified fork of HuggingFace Transformers (>= 4.4.0). The fork adds LoRA support directly into RoBERTa and DeBERTa v2 model architectures via `loralib.MergedLinear` layers in their attention modules. The environment includes DeepSpeed for distributed training, the HuggingFace Datasets library for GLUE data loading, and Accelerate for multi-GPU orchestration. The Transformers fork is installed in editable mode (`pip install -e .`).

Usage

Use this environment for the NLU GLUE Finetuning workflow. It is required to run `run_glue.py` which fine-tunes RoBERTa-base, RoBERTa-large, or DeBERTa V2 XXLarge on GLUE benchmark tasks with LoRA adaptation.

System Requirements

Category	Requirement	Notes
OS	Linux	Required for CUDA, NCCL, and DeepSpeed
Hardware	NVIDIA GPU with CUDA 11.1 support	cudatoolkit=11.1.74 specified in environment.yml
Hardware	8 GPUs recommended	Training scripts default to `num_gpus=8`
Python	3.7.10	Pinned in environment.yml
Disk	~10GB	For conda environment, models, and GLUE data

Dependencies

Conda Packages

`python` = 3.7.10
`pytorch` = 1.9.0 (py3.7_cuda11.1_cudnn8.0.5_0)
`cudatoolkit` = 11.1.74
`torchvision` = 0.10.0
`torchaudio` = 0.9.0
`numpy` = 1.20.2

Pip Packages

`loralib` == 0.1.1
`accelerate` == 0.3.0
`datasets` == 1.9.0
`deepspeed` == 0.5.0
`scikit-learn` == 0.24.2
`scipy` == 1.7.0
`sentencepiece` == 0.1.96
`tokenizers` == 0.10.3
`triton` == 0.4.2
`azureml-core` == 1.32.0

Modified Transformers

HuggingFace Transformers >= 4.4.0 (forked, installed via `pip install -e .`)

Credentials

No credentials required for local training. The GLUE data is downloaded via public URLs. Pre-trained model checkpoints (RoBERTa, DeBERTa) are downloaded from the public Hugging Face model hub.

Quick Install

# Create the conda environment from specification
cd examples/NLU
conda env create -f environment.yml

# Activate the environment
conda activate NLU

# Install the modified Transformers fork in editable mode
pip install -e .

# Download GLUE data
python utils/download_glue_data.py --data_dir glue_data --tasks all

Code Evidence

Minimum Transformers version check from `examples/NLU/examples/text-classification/run_glue.py:49`:

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")

CUBLAS reproducibility environment variable from `examples/NLU/roberta_base_mnli.sh:2`:

export CUBLAS_WORKSPACE_CONFIG=":16:8" # https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
export PYTHONHASHSEED=0

Conda environment header from `examples/NLU/environment.yml:1-6`:

name: NLU
channels:
  - pytorch
  - nvidia
  - defaults
dependencies:

loralib pinned in environment.yml `examples/NLU/environment.yml:106`:

    - loralib==0.1.1

Common Errors

Error Message	Cause	Solution
`ImportError: transformers >= 4.4.0 required`	Transformers fork not installed or wrong version	Run `pip install -e .` from `examples/NLU/` directory
`ModuleNotFoundError: No module named 'loralib'`	loralib not installed in conda env	`pip install loralib==0.1.1`
`CUDA error: no kernel image is available`	CUDA toolkit version mismatch with GPU	Ensure cudatoolkit=11.1 matches your GPU driver

Compatibility Notes

Conda Required: Unlike the NLG example (pip-only), the NLU example requires a Conda environment due to complex dependency pinning.
Modified Transformers: The NLU example uses a forked HuggingFace Transformers with LoRA injected into RoBERTa (`modeling_roberta.py`) and DeBERTa v2 (`modeling_deberta_v2.py`). Standard transformers will not work.
Deterministic Training: Scripts set `CUBLAS_WORKSPACE_CONFIG`, `PYTHONHASHSEED=0`, and `--use_deterministic_algorithms` for reproducibility.
DeepSpeed: `ds_config.json` provides ZeRO Stage 2 configuration for memory-efficient distributed training.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment