Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Microsoft BIPIA Python CUDA GPU Environment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, LLM_Security
Last Updated 2026-02-14 15:00 GMT

Overview

Linux (Ubuntu 20.04) environment with CUDA-capable NVIDIA GPUs, Python 3.8+, PyTorch 2.0+, and HuggingFace Transformers 4.34+ for LLM inference and evaluation.

Description

This environment provides the core runtime for running BIPIA benchmark evaluations on open-source LLMs. It includes PyTorch with CUDA support for GPU-accelerated inference, HuggingFace Transformers for model loading, vLLM for high-throughput inference on LLAMA-family models, and the HuggingFace Accelerate library for distributed inference. Models are loaded in float16 precision by default, with automatic bfloat16 selection on Ampere+ GPUs (compute capability >= 8.0). The codebase also supports 8-bit quantized model loading via the Transformers library.

Usage

Use this environment for any LLM inference, evaluation, or fine-tuning workflow involving open-source models in the BIPIA benchmark. This is the mandatory prerequisite for running the Inference_Pipeline, AutoLLM, VicunaWithSpecialToken, and HF_Trainer_For_Defense implementations. API-based models (GPT-3.5/GPT-4) do not require GPU hardware but still need this Python environment for the benchmark framework.

System Requirements

Category Requirement Notes
OS Ubuntu 20.04 LTS Tested and verified; other Linux distributions may work
Hardware (<=13B models) 2x NVIDIA V100 GPUs Minimum for open-source models up to 13B parameters
Hardware (>13B models) 4-8x NVIDIA V100 GPUs Required for models larger than 13B; A100/H100 also supported
Hardware (fine-tuning) 8x NVIDIA V100 GPUs Required for white-box defense fine-tuning with DeepSpeed
Python >= 3.8 Specified in pyproject.toml
CUDA CUDA toolkit compatible with PyTorch >= 2.0.1 Required for torch.cuda operations

Dependencies

System Packages

  • NVIDIA CUDA toolkit (compatible with PyTorch 2.0+)
  • `git` (for cloning the repository)

Python Packages

  • `torch` >= 2.0.1
  • `transformers` >= 4.34.0
  • `accelerate` >= 0.15.0
  • `deepspeed` >= 0.9.5
  • `vllm` >= 0.2.0
  • `fschat` >= 0.2.35
  • `peft` (any version)
  • `datasets` >= 2.8.0
  • `numpy` (any version)
  • `pandas` (any version)
  • `tqdm` (any version)
  • `jsonlines` (any version)
  • `py-cpuinfo` (any version)
  • `evaluate` (any version)
  • `rouge-score` (any version)
  • `langdetect` (any version)
  • `thefuzz` (any version)
  • `emoji` (any version)
  • `wandb` (any version)
  • `nltk` (any version, for PunktSentenceTokenizer)
  • `setuptools` >= 61.0

Credentials

The following credentials may be required depending on the models used:

  • `auth_token`: HuggingFace authentication token for gated models (e.g., Llama 2). Configured in YAML config files per model.
  • `WANDB_PROJECT`: Weights & Biases project name for experiment tracking (optional).
  • `WANDB_RUN`: Weights & Biases run name for experiment tracking (optional).

Quick Install

# Clone the repository
git clone git@github.com:microsoft/BIPIA.git
cd BIPIA

# Install bipia and all dependencies
pip install .

Code Evidence

CUDA availability check from `bipia/model/utils.py:33-35`:

def get_compute_capability():
    if not torch.cuda.is_available():
        raise ValueError("CUDA is not available on this device!")

bf16 support detection via GPU compute capability from `bipia/model/utils.py:41-45`:

def check_bf16_support():
    capability = get_compute_capability()
    if capability >= 8.0:
        return True
    return False

Model loading with float16 and device_map auto from `bipia/model/llm_worker.py:66-73`:

self.model = AutoModelForCausalLM.from_pretrained(
    self.config["model_name"],
    load_in_8bit=self.config["load_8bit"],
    device_map="auto",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
)

Python version requirement from `pyproject.toml:10`:

requires-python = ">=3.8"

Common Errors

Error Message Cause Solution
`ValueError: CUDA is not available on this device!` No NVIDIA GPU detected or CUDA drivers not installed Install CUDA toolkit and verify with `nvidia-smi`
`torch.cuda.OutOfMemoryError` Insufficient GPU VRAM for the selected model Use fewer GPUs with tensor parallelism, enable 8-bit loading, or use a smaller model
`ImportError: No module named 'vllm'` vLLM not installed `pip install vllm>=0.2.0`
`OSError: ... is a gated model` HuggingFace model requires authentication Set `auth_token` in the model YAML config file

Compatibility Notes

  • Linux only: The package has been tested and verified on Ubuntu 20.04.6. Windows is not supported for torch.compile (the code explicitly checks `sys.platform != "win32"`).
  • GPU tiers: V100 (16/32GB) is the minimum tested GPU. A100 and H100 GPUs are also supported and will automatically use bfloat16 precision.
  • 8-bit loading: Supported via Transformers `load_in_8bit` parameter to reduce VRAM usage on smaller GPUs.
  • vLLM models: LLAMA-family and several other models (Dolly, StableLM, MPT, Mistral) use vLLM for inference, which requires separate CUDA memory management.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment