Environment:Liu00222 Open Prompt Injection CUDA Environment

Knowledge Sources	Open-Prompt-Injection PyTorch CUDA BitsAndBytes
Domains	Infrastructure, NLP, Security
Last Updated	2026-02-14 15:30 GMT

Overview

Linux environment with NVIDIA CUDA 12.1 GPU, Python 3.9, PyTorch 2.3+, and bitsandbytes for 4-bit quantized model inference.

Description

This environment provides GPU-accelerated inference for local language models used in prompt injection detection (DataSentinel), localization (PromptLocate), perplexity-based defense, and general local model wrappers (Vicuna, Llama, DeepSeek, Flan, InternLM). Models are loaded with 4-bit NF4 quantization via bitsandbytes and LoRA adapters via PEFT. The GPT-2 helper model for causal influence analysis also requires CUDA. All tensor operations explicitly call `.to("cuda")` throughout the codebase.

Usage

Use this environment for any workflow that uses local model inference including the DataSentinel detection workflow, the PromptLocate localization workflow, and the prompt injection experiment workflow when using local models (Vicuna, Llama, Llama3, DeepSeek, Mistral/QLoRA, Flan, InternLM). It is not required when using only API-based models (GPT via OpenAI API, PaLM2 via Google API).

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu recommended)	Conda environment spec targets linux-64
Hardware	NVIDIA GPU with CUDA support	Minimum 16GB VRAM for 7B models in 4-bit; 20GiB explicitly configured for DeepSeek R1 distill models
Disk	50GB+ SSD	Model weights (7B models ~4GB quantized) plus datasets and checkpoints
Runtime	CUDA 12.1	nvidia-cuda-runtime-cu12==12.1.105 pinned in environment.yml

Dependencies

System Packages

`nvidia-cublas-cu12` = 12.1.3.1
`nvidia-cuda-cupti-cu12` = 12.1.105
`nvidia-cuda-nvrtc-cu12` = 12.1.105
`nvidia-cuda-runtime-cu12` = 12.1.105
`nvidia-cudnn-cu12` = 8.9.2.26
`nvidia-cufft-cu12` = 11.0.2.54
`nvidia-curand-cu12` = 10.3.2.106
`nvidia-cusolver-cu12` = 11.4.5.107
`nvidia-cusparse-cu12` = 12.1.0.106
`nvidia-nccl-cu12` = 2.20.5
`nvidia-nvjitlink-cu12` = 12.5.40
`nvidia-nvtx-cu12` = 12.1.105

Python Packages (GPU-specific)

`torch` == 2.3.1
`triton` == 2.3.1
`bitsandbytes` == 0.43.1
`accelerate` == 0.32.0
`peft` == 0.11.1
`transformers` == 4.42.0

Credentials

No GPU-specific credentials are required. However, local model weights must be accessible:

HuggingFace model weights: Models like `mistralai/Mistral-7B-v0.1` are downloaded via `AutoModelForCausalLM.from_pretrained()`. A HuggingFace account or token may be required for gated models.
Fine-tuned checkpoints: DataSentinel and PromptLocate require downloaded LoRA adapter checkpoints specified via the `ft_path` config parameter.

Quick Install

# Create conda environment from the provided spec
conda env create -f environment.yml --name my_custom_env
conda activate my_custom_env

# Or install GPU-critical packages manually
pip install torch==2.3.1 transformers==4.42.0 bitsandbytes==0.43.1 accelerate==0.32.0 peft==0.11.1 triton==2.3.1

Code Evidence

CUDA device placement in `models/QLoraModel.py:78`:

input_ids = self.tokenizer(processed_eval_prompt, return_tensors="pt").to("cuda")

4-bit NF4 quantization config in `models/QLoraModel.py:17-22`:

self.bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

GPT-2 helper model moved to CUDA in `apps/PromptLocate.py:254`:

self.helper_model = AutoModelForCausalLM.from_pretrained(helper_model_name, output_attentions=True)
self.helper_model.to('cuda')

DeepSeek R1 explicit GPU memory limit in `models/DeepSeek.py:50-51`:

device_map={"": 0},  # means: put all layers on cuda:0
max_memory={0: "20GiB"},  # or however much GPU 0 has

PPL defense model loaded to CUDA in `apps/utils.py:11`:

self.model = model.cuda()

Vicuna model generation on CUDA in `models/Vicuna.py:49`:

output_ids = self.model.generate(
    torch.as_tensor(input_ids).cuda(),
    ...
)

Common Errors

Error Message	Cause	Solution
`RuntimeError: CUDA out of memory`	Insufficient VRAM for 7B model even in 4-bit	Use a GPU with at least 16GB VRAM, or reduce `max_gpu_memory` in config
`ValueError: Bad ft path: ...`	LoRA adapter checkpoint not found at specified path	Download the fine-tuned checkpoint and set `ft_path` correctly in the model config JSON
`ImportError: bitsandbytes`	bitsandbytes not installed or CUDA version mismatch	Install with `pip install bitsandbytes==0.43.1`; ensure CUDA 12.1 libraries are available
`torch.cuda.is_available() returns False`	No NVIDIA GPU detected or drivers not installed	Install NVIDIA drivers and CUDA toolkit; verify with `nvidia-smi`

Compatibility Notes

API-only models (GPT, PaLM2): Do not require CUDA. These use HTTP API calls and can run on CPU-only machines.
PPL defense: Loads Vicuna-7B-v1.3 as a surrogate model via fastchat, requiring CUDA and 8-bit loading with ~9GiB VRAM allocation.
DeepSeek R1 distill models: Explicitly pin all layers to `cuda:0` with a 20GiB memory limit. Requires a single GPU with at least 20GB VRAM.
torch.compile(): Used by DeepSeek R1 distill wrappers for inference optimization; requires PyTorch 2.0+.
bfloat16: QLoRA models use `bnb_4bit_compute_dtype=torch.bfloat16`, requiring Ampere (A100) or newer GPU architecture for native support.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment