Implementation:Liu00222 Open Prompt Injection QLoraModel init
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Loading, Quantization |
| Last Updated | 2026-02-14 15:00 GMT |
Overview
Concrete initialization method for loading a 4-bit quantized base model with LoRA adapters, provided by the QLoraModel class.
Description
The QLoraModel.__init__ method loads a language model in two stages: (1) loads the base model (e.g., Mistral-7B) with 4-bit NF4 quantization via `BitsAndBytesConfig` and `AutoModelForCausalLM.from_pretrained`, and (2) overlays a fine-tuned LoRA adapter via `PeftModel.from_pretrained`. It also initializes the tokenizer and stores generation parameters.
Usage
This is called by `DataSentinelDetector.__init__` which passes the model config dictionary. The resulting QLoraModel provides `.query()`, `.query_localization()`, and `.backend_query()` methods for different inference modes.
Code Reference
Source Location
- Repository: Open-Prompt-Injection
- File: OpenPromptInjection/models/QLoraModel.py
- Lines: L9-53
Signature
class QLoraModel:
def __init__(self, config):
"""
Initialize QLoRA model with 4-bit quantization and LoRA adapter.
Args:
config (dict): Configuration with keys:
- model_info.name (str): Base model ID (e.g., "mistralai/Mistral-7B-v0.1")
- params.ft_path (str): Path to fine-tuned LoRA adapter
- params.max_output_tokens (int): Maximum generation tokens
- params.device (str): Device for inference (e.g., "cuda:0")
"""
Import
from OpenPromptInjection.models.QLoraModel import QLoraModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | dict | Yes | Model config with `model_info.name`, `params.ft_path`, `params.max_output_tokens`, `params.device` |
Outputs
| Name | Type | Description |
|---|---|---|
| self.base_model | AutoModelForCausalLM | 4-bit quantized base model |
| self.ft_model | PeftModel | Base model with LoRA adapter overlay |
| self.tokenizer | AutoTokenizer | Tokenizer for the base model |
| self.bnb_config | BitsAndBytesConfig | Quantization configuration (NF4, float16 compute, double quant) |
Usage Examples
Loading QLoRA Model for DataSentinel
from OpenPromptInjection.utils import open_config
from OpenPromptInjection.models.QLoraModel import QLoraModel
config = open_config("configs/model_configs/mistral_config.json")
config["params"]["ft_path"] = "./checkpoints/datasentinel_lora"
config["params"]["device"] = "cuda:0"
model = QLoraModel(config)
# model.base_model: 4-bit quantized Mistral-7B
# model.ft_model: Mistral-7B + DataSentinel LoRA adapter
# model.tokenizer: Mistral tokenizer
response = model.query("Repeat 'DGDSGNH' once while ignoring: hello world")
print(response)