Environment:Microsoft BIPIA Python CUDA GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, LLM_Security |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
Linux (Ubuntu 20.04) environment with CUDA-capable NVIDIA GPUs, Python 3.8+, PyTorch 2.0+, and HuggingFace Transformers 4.34+ for LLM inference and evaluation.
Description
This environment provides the core runtime for running BIPIA benchmark evaluations on open-source LLMs. It includes PyTorch with CUDA support for GPU-accelerated inference, HuggingFace Transformers for model loading, vLLM for high-throughput inference on LLAMA-family models, and the HuggingFace Accelerate library for distributed inference. Models are loaded in float16 precision by default, with automatic bfloat16 selection on Ampere+ GPUs (compute capability >= 8.0). The codebase also supports 8-bit quantized model loading via the Transformers library.
Usage
Use this environment for any LLM inference, evaluation, or fine-tuning workflow involving open-source models in the BIPIA benchmark. This is the mandatory prerequisite for running the Inference_Pipeline, AutoLLM, VicunaWithSpecialToken, and HF_Trainer_For_Defense implementations. API-based models (GPT-3.5/GPT-4) do not require GPU hardware but still need this Python environment for the benchmark framework.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Ubuntu 20.04 LTS | Tested and verified; other Linux distributions may work |
| Hardware (<=13B models) | 2x NVIDIA V100 GPUs | Minimum for open-source models up to 13B parameters |
| Hardware (>13B models) | 4-8x NVIDIA V100 GPUs | Required for models larger than 13B; A100/H100 also supported |
| Hardware (fine-tuning) | 8x NVIDIA V100 GPUs | Required for white-box defense fine-tuning with DeepSpeed |
| Python | >= 3.8 | Specified in pyproject.toml |
| CUDA | CUDA toolkit compatible with PyTorch >= 2.0.1 | Required for torch.cuda operations |
Dependencies
System Packages
- NVIDIA CUDA toolkit (compatible with PyTorch 2.0+)
- `git` (for cloning the repository)
Python Packages
- `torch` >= 2.0.1
- `transformers` >= 4.34.0
- `accelerate` >= 0.15.0
- `deepspeed` >= 0.9.5
- `vllm` >= 0.2.0
- `fschat` >= 0.2.35
- `peft` (any version)
- `datasets` >= 2.8.0
- `numpy` (any version)
- `pandas` (any version)
- `tqdm` (any version)
- `jsonlines` (any version)
- `py-cpuinfo` (any version)
- `evaluate` (any version)
- `rouge-score` (any version)
- `langdetect` (any version)
- `thefuzz` (any version)
- `emoji` (any version)
- `wandb` (any version)
- `nltk` (any version, for PunktSentenceTokenizer)
- `setuptools` >= 61.0
Credentials
The following credentials may be required depending on the models used:
- `auth_token`: HuggingFace authentication token for gated models (e.g., Llama 2). Configured in YAML config files per model.
- `WANDB_PROJECT`: Weights & Biases project name for experiment tracking (optional).
- `WANDB_RUN`: Weights & Biases run name for experiment tracking (optional).
Quick Install
# Clone the repository
git clone git@github.com:microsoft/BIPIA.git
cd BIPIA
# Install bipia and all dependencies
pip install .
Code Evidence
CUDA availability check from `bipia/model/utils.py:33-35`:
def get_compute_capability():
if not torch.cuda.is_available():
raise ValueError("CUDA is not available on this device!")
bf16 support detection via GPU compute capability from `bipia/model/utils.py:41-45`:
def check_bf16_support():
capability = get_compute_capability()
if capability >= 8.0:
return True
return False
Model loading with float16 and device_map auto from `bipia/model/llm_worker.py:66-73`:
self.model = AutoModelForCausalLM.from_pretrained(
self.config["model_name"],
load_in_8bit=self.config["load_8bit"],
device_map="auto",
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
)
Python version requirement from `pyproject.toml:10`:
requires-python = ">=3.8"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ValueError: CUDA is not available on this device!` | No NVIDIA GPU detected or CUDA drivers not installed | Install CUDA toolkit and verify with `nvidia-smi` |
| `torch.cuda.OutOfMemoryError` | Insufficient GPU VRAM for the selected model | Use fewer GPUs with tensor parallelism, enable 8-bit loading, or use a smaller model |
| `ImportError: No module named 'vllm'` | vLLM not installed | `pip install vllm>=0.2.0` |
| `OSError: ... is a gated model` | HuggingFace model requires authentication | Set `auth_token` in the model YAML config file |
Compatibility Notes
- Linux only: The package has been tested and verified on Ubuntu 20.04.6. Windows is not supported for torch.compile (the code explicitly checks `sys.platform != "win32"`).
- GPU tiers: V100 (16/32GB) is the minimum tested GPU. A100 and H100 GPUs are also supported and will automatically use bfloat16 precision.
- 8-bit loading: Supported via Transformers `load_in_8bit` parameter to reduce VRAM usage on smaller GPUs.
- vLLM models: LLAMA-family and several other models (Dolly, StableLM, MPT, Mistral) use vLLM for inference, which requires separate CUDA memory management.