Environment:Microsoft BIPIA Python CUDA GPU Environment

Knowledge Sources	Microsoft BIPIA PyTorch CUDA
Domains	Infrastructure, LLM_Security
Last Updated	2026-02-14 15:00 GMT

Overview

Linux (Ubuntu 20.04) environment with CUDA-capable NVIDIA GPUs, Python 3.8+, PyTorch 2.0+, and HuggingFace Transformers 4.34+ for LLM inference and evaluation.

Description

This environment provides the core runtime for running BIPIA benchmark evaluations on open-source LLMs. It includes PyTorch with CUDA support for GPU-accelerated inference, HuggingFace Transformers for model loading, vLLM for high-throughput inference on LLAMA-family models, and the HuggingFace Accelerate library for distributed inference. Models are loaded in float16 precision by default, with automatic bfloat16 selection on Ampere+ GPUs (compute capability >= 8.0). The codebase also supports 8-bit quantized model loading via the Transformers library.

Usage

Use this environment for any LLM inference, evaluation, or fine-tuning workflow involving open-source models in the BIPIA benchmark. This is the mandatory prerequisite for running the Inference_Pipeline, AutoLLM, VicunaWithSpecialToken, and HF_Trainer_For_Defense implementations. API-based models (GPT-3.5/GPT-4) do not require GPU hardware but still need this Python environment for the benchmark framework.

System Requirements

Category	Requirement	Notes
OS	Ubuntu 20.04 LTS	Tested and verified; other Linux distributions may work
Hardware (<=13B models)	2x NVIDIA V100 GPUs	Minimum for open-source models up to 13B parameters
Hardware (>13B models)	4-8x NVIDIA V100 GPUs	Required for models larger than 13B; A100/H100 also supported
Hardware (fine-tuning)	8x NVIDIA V100 GPUs	Required for white-box defense fine-tuning with DeepSpeed
Python	>= 3.8	Specified in pyproject.toml
CUDA	CUDA toolkit compatible with PyTorch >= 2.0.1	Required for torch.cuda operations

Dependencies

System Packages

NVIDIA CUDA toolkit (compatible with PyTorch 2.0+)
`git` (for cloning the repository)

Python Packages

`torch` >= 2.0.1
`transformers` >= 4.34.0
`accelerate` >= 0.15.0
`deepspeed` >= 0.9.5
`vllm` >= 0.2.0
`fschat` >= 0.2.35
`peft` (any version)
`datasets` >= 2.8.0
`numpy` (any version)
`pandas` (any version)
`tqdm` (any version)
`jsonlines` (any version)
`py-cpuinfo` (any version)
`evaluate` (any version)
`rouge-score` (any version)
`langdetect` (any version)
`thefuzz` (any version)
`emoji` (any version)
`wandb` (any version)
`nltk` (any version, for PunktSentenceTokenizer)
`setuptools` >= 61.0

Credentials

The following credentials may be required depending on the models used:

`auth_token`: HuggingFace authentication token for gated models (e.g., Llama 2). Configured in YAML config files per model.
`WANDB_PROJECT`: Weights & Biases project name for experiment tracking (optional).
`WANDB_RUN`: Weights & Biases run name for experiment tracking (optional).

Quick Install

# Clone the repository
git clone git@github.com:microsoft/BIPIA.git
cd BIPIA

# Install bipia and all dependencies
pip install .

Code Evidence

CUDA availability check from `bipia/model/utils.py:33-35`:

def get_compute_capability():
    if not torch.cuda.is_available():
        raise ValueError("CUDA is not available on this device!")

bf16 support detection via GPU compute capability from `bipia/model/utils.py:41-45`:

def check_bf16_support():
    capability = get_compute_capability()
    if capability >= 8.0:
        return True
    return False

Model loading with float16 and device_map auto from `bipia/model/llm_worker.py:66-73`:

self.model = AutoModelForCausalLM.from_pretrained(
    self.config["model_name"],
    load_in_8bit=self.config["load_8bit"],
    device_map="auto",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)

Python version requirement from `pyproject.toml:10`:

requires-python = ">=3.8"

Common Errors

Error Message	Cause	Solution
`ValueError: CUDA is not available on this device!`	No NVIDIA GPU detected or CUDA drivers not installed	Install CUDA toolkit and verify with `nvidia-smi`
`torch.cuda.OutOfMemoryError`	Insufficient GPU VRAM for the selected model	Use fewer GPUs with tensor parallelism, enable 8-bit loading, or use a smaller model
`ImportError: No module named 'vllm'`	vLLM not installed	`pip install vllm>=0.2.0`
`OSError: ... is a gated model`	HuggingFace model requires authentication	Set `auth_token` in the model YAML config file

Compatibility Notes

Linux only: The package has been tested and verified on Ubuntu 20.04.6. Windows is not supported for torch.compile (the code explicitly checks `sys.platform != "win32"`).
GPU tiers: V100 (16/32GB) is the minimum tested GPU. A100 and H100 GPUs are also supported and will automatically use bfloat16 precision.
8-bit loading: Supported via Transformers `load_in_8bit` parameter to reduce VRAM usage on smaller GPUs.
vLLM models: LLAMA-family and several other models (Dolly, StableLM, MPT, Mistral) use vLLM for inference, which requires separate CUDA memory management.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment