Environment:Liu00222 Open Prompt Injection CUDA Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, NLP, Security |
| Last Updated | 2026-02-14 15:30 GMT |
Overview
Linux environment with NVIDIA CUDA 12.1 GPU, Python 3.9, PyTorch 2.3+, and bitsandbytes for 4-bit quantized model inference.
Description
This environment provides GPU-accelerated inference for local language models used in prompt injection detection (DataSentinel), localization (PromptLocate), perplexity-based defense, and general local model wrappers (Vicuna, Llama, DeepSeek, Flan, InternLM). Models are loaded with 4-bit NF4 quantization via bitsandbytes and LoRA adapters via PEFT. The GPT-2 helper model for causal influence analysis also requires CUDA. All tensor operations explicitly call `.to("cuda")` throughout the codebase.
Usage
Use this environment for any workflow that uses local model inference including the DataSentinel detection workflow, the PromptLocate localization workflow, and the prompt injection experiment workflow when using local models (Vicuna, Llama, Llama3, DeepSeek, Mistral/QLoRA, Flan, InternLM). It is not required when using only API-based models (GPT via OpenAI API, PaLM2 via Google API).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended) | Conda environment spec targets linux-64 |
| Hardware | NVIDIA GPU with CUDA support | Minimum 16GB VRAM for 7B models in 4-bit; 20GiB explicitly configured for DeepSeek R1 distill models |
| Disk | 50GB+ SSD | Model weights (7B models ~4GB quantized) plus datasets and checkpoints |
| Runtime | CUDA 12.1 | nvidia-cuda-runtime-cu12==12.1.105 pinned in environment.yml |
Dependencies
System Packages
- `nvidia-cublas-cu12` = 12.1.3.1
- `nvidia-cuda-cupti-cu12` = 12.1.105
- `nvidia-cuda-nvrtc-cu12` = 12.1.105
- `nvidia-cuda-runtime-cu12` = 12.1.105
- `nvidia-cudnn-cu12` = 8.9.2.26
- `nvidia-cufft-cu12` = 11.0.2.54
- `nvidia-curand-cu12` = 10.3.2.106
- `nvidia-cusolver-cu12` = 11.4.5.107
- `nvidia-cusparse-cu12` = 12.1.0.106
- `nvidia-nccl-cu12` = 2.20.5
- `nvidia-nvjitlink-cu12` = 12.5.40
- `nvidia-nvtx-cu12` = 12.1.105
Python Packages (GPU-specific)
- `torch` == 2.3.1
- `triton` == 2.3.1
- `bitsandbytes` == 0.43.1
- `accelerate` == 0.32.0
- `peft` == 0.11.1
- `transformers` == 4.42.0
Credentials
No GPU-specific credentials are required. However, local model weights must be accessible:
- HuggingFace model weights: Models like `mistralai/Mistral-7B-v0.1` are downloaded via `AutoModelForCausalLM.from_pretrained()`. A HuggingFace account or token may be required for gated models.
- Fine-tuned checkpoints: DataSentinel and PromptLocate require downloaded LoRA adapter checkpoints specified via the `ft_path` config parameter.
Quick Install
# Create conda environment from the provided spec
conda env create -f environment.yml --name my_custom_env
conda activate my_custom_env
# Or install GPU-critical packages manually
pip install torch==2.3.1 transformers==4.42.0 bitsandbytes==0.43.1 accelerate==0.32.0 peft==0.11.1 triton==2.3.1
Code Evidence
CUDA device placement in `models/QLoraModel.py:78`:
input_ids = self.tokenizer(processed_eval_prompt, return_tensors="pt").to("cuda")
4-bit NF4 quantization config in `models/QLoraModel.py:17-22`:
self.bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
GPT-2 helper model moved to CUDA in `apps/PromptLocate.py:254`:
self.helper_model = AutoModelForCausalLM.from_pretrained(helper_model_name, output_attentions=True)
self.helper_model.to('cuda')
DeepSeek R1 explicit GPU memory limit in `models/DeepSeek.py:50-51`:
device_map={"": 0}, # means: put all layers on cuda:0
max_memory={0: "20GiB"}, # or however much GPU 0 has
PPL defense model loaded to CUDA in `apps/utils.py:11`:
self.model = model.cuda()
Vicuna model generation on CUDA in `models/Vicuna.py:49`:
output_ids = self.model.generate(
torch.as_tensor(input_ids).cuda(),
...
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `RuntimeError: CUDA out of memory` | Insufficient VRAM for 7B model even in 4-bit | Use a GPU with at least 16GB VRAM, or reduce `max_gpu_memory` in config |
| `ValueError: Bad ft path: ...` | LoRA adapter checkpoint not found at specified path | Download the fine-tuned checkpoint and set `ft_path` correctly in the model config JSON |
| `ImportError: bitsandbytes` | bitsandbytes not installed or CUDA version mismatch | Install with `pip install bitsandbytes==0.43.1`; ensure CUDA 12.1 libraries are available |
| `torch.cuda.is_available() returns False` | No NVIDIA GPU detected or drivers not installed | Install NVIDIA drivers and CUDA toolkit; verify with `nvidia-smi` |
Compatibility Notes
- API-only models (GPT, PaLM2): Do not require CUDA. These use HTTP API calls and can run on CPU-only machines.
- PPL defense: Loads Vicuna-7B-v1.3 as a surrogate model via fastchat, requiring CUDA and 8-bit loading with ~9GiB VRAM allocation.
- DeepSeek R1 distill models: Explicitly pin all layers to `cuda:0` with a 20GiB memory limit. Requires a single GPU with at least 20GB VRAM.
- torch.compile(): Used by DeepSeek R1 distill wrappers for inference optimization; requires PyTorch 2.0+.
- bfloat16: QLoRA models use `bnb_4bit_compute_dtype=torch.bfloat16`, requiring Ampere (A100) or newer GPU architecture for native support.
Related Pages
- Implementation:Liu00222_Open_Prompt_Injection_QLoraModel_init
- Implementation:Liu00222_Open_Prompt_Injection_DataSentinelDetector_detect
- Implementation:Liu00222_Open_Prompt_Injection_PromptLocate_locate_and_recover
- Implementation:Liu00222_Open_Prompt_Injection_causal_influence
- Implementation:Liu00222_Open_Prompt_Injection_compute_conditional_probability