Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Microsoft LoRA NLU Conda Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, NLU
Last Updated 2026-02-10 05:30 GMT

Overview

Conda environment with Python 3.7, PyTorch 1.9, CUDA 11.1, and a modified HuggingFace Transformers fork for LoRA-based NLU fine-tuning on GLUE tasks.

Description

The NLU example uses a Conda environment (defined in `environment.yml`) with a modified fork of HuggingFace Transformers (>= 4.4.0). The fork adds LoRA support directly into RoBERTa and DeBERTa v2 model architectures via `loralib.MergedLinear` layers in their attention modules. The environment includes DeepSpeed for distributed training, the HuggingFace Datasets library for GLUE data loading, and Accelerate for multi-GPU orchestration. The Transformers fork is installed in editable mode (`pip install -e .`).

Usage

Use this environment for the NLU GLUE Finetuning workflow. It is required to run `run_glue.py` which fine-tunes RoBERTa-base, RoBERTa-large, or DeBERTa V2 XXLarge on GLUE benchmark tasks with LoRA adaptation.

System Requirements

Category Requirement Notes
OS Linux Required for CUDA, NCCL, and DeepSpeed
Hardware NVIDIA GPU with CUDA 11.1 support cudatoolkit=11.1.74 specified in environment.yml
Hardware 8 GPUs recommended Training scripts default to `num_gpus=8`
Python 3.7.10 Pinned in environment.yml
Disk ~10GB For conda environment, models, and GLUE data

Dependencies

Conda Packages

  • `python` = 3.7.10
  • `pytorch` = 1.9.0 (py3.7_cuda11.1_cudnn8.0.5_0)
  • `cudatoolkit` = 11.1.74
  • `torchvision` = 0.10.0
  • `torchaudio` = 0.9.0
  • `numpy` = 1.20.2

Pip Packages

  • `loralib` == 0.1.1
  • `accelerate` == 0.3.0
  • `datasets` == 1.9.0
  • `deepspeed` == 0.5.0
  • `scikit-learn` == 0.24.2
  • `scipy` == 1.7.0
  • `sentencepiece` == 0.1.96
  • `tokenizers` == 0.10.3
  • `triton` == 0.4.2
  • `azureml-core` == 1.32.0

Modified Transformers

  • HuggingFace Transformers >= 4.4.0 (forked, installed via `pip install -e .`)

Credentials

No credentials required for local training. The GLUE data is downloaded via public URLs. Pre-trained model checkpoints (RoBERTa, DeBERTa) are downloaded from the public Hugging Face model hub.

Quick Install

# Create the conda environment from specification
cd examples/NLU
conda env create -f environment.yml

# Activate the environment
conda activate NLU

# Install the modified Transformers fork in editable mode
pip install -e .

# Download GLUE data
python utils/download_glue_data.py --data_dir glue_data --tasks all

Code Evidence

Minimum Transformers version check from `examples/NLU/examples/text-classification/run_glue.py:49`:

# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.4.0")

CUBLAS reproducibility environment variable from `examples/NLU/roberta_base_mnli.sh:2`:

export CUBLAS_WORKSPACE_CONFIG=":16:8" # https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility
export PYTHONHASHSEED=0

Conda environment header from `examples/NLU/environment.yml:1-6`:

name: NLU
channels:
  - pytorch
  - nvidia
  - defaults
dependencies:

loralib pinned in environment.yml `examples/NLU/environment.yml:106`:

    - loralib==0.1.1

Common Errors

Error Message Cause Solution
`ImportError: transformers >= 4.4.0 required` Transformers fork not installed or wrong version Run `pip install -e .` from `examples/NLU/` directory
`ModuleNotFoundError: No module named 'loralib'` loralib not installed in conda env `pip install loralib==0.1.1`
`CUDA error: no kernel image is available` CUDA toolkit version mismatch with GPU Ensure cudatoolkit=11.1 matches your GPU driver

Compatibility Notes

  • Conda Required: Unlike the NLG example (pip-only), the NLU example requires a Conda environment due to complex dependency pinning.
  • Modified Transformers: The NLU example uses a forked HuggingFace Transformers with LoRA injected into RoBERTa (`modeling_roberta.py`) and DeBERTa v2 (`modeling_deberta_v2.py`). Standard transformers will not work.
  • Deterministic Training: Scripts set `CUBLAS_WORKSPACE_CONFIG`, `PYTHONHASHSEED=0`, and `--use_deterministic_algorithms` for reproducibility.
  • DeepSpeed: `ds_config.json` provides ZeRO Stage 2 configuration for memory-efficient distributed training.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment