Implementation:Norrrrrrr lyn WAInjectBench process file
| Knowledge Sources | |
|---|---|
| Domains | NLP, Security, Evaluation |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for evaluating a text detector against a single JSONL file and computing TPR/FPR, provided by the WAInjectBench main_text module.
Description
The process_file function takes a file path, a loaded detector module, and a malicious flag. It calls detector.detect(str(file_path)) to obtain flagged IDs, counts total lines by iterating the file, then computes the rate rounded to 4 decimal places. It returns a dictionary with the dataset name, detection rate, flagged IDs, and total count.
Usage
Called iteratively from run_experiment for every JSONL file in the benign/ and malicious/ directories.
Code Reference
Source Location
- Repository: WAInjectBench
- File: main_text.py (L23-46)
Signature
def process_file(file_path: Path, detector, is_malicious: bool) -> Dict:
"""
file_path: path to the JSONL file
detector: loaded detector module
is_malicious: whether the file comes from the malicious folder
"""
data_name = file_path.name
detect_ids = detector.detect(str(file_path))
total_num = sum(1 for _ in open(file_path, "r", encoding="utf-8"))
if is_malicious:
rate_key, rate_value = "tpr", round(len(detect_ids) / total_num, 4) if total_num > 0 else 0.0
else:
rate_key, rate_value = "fpr", round(len(detect_ids) / total_num, 4) if total_num > 0 else 0.0
result = {
"data_name": data_name,
rate_key: rate_value,
"detect_ids": detect_ids,
"total_num": total_num,
}
return result
Import
from main_text import process_file
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_path | Path | Yes | Path to a JSONL file containing text samples |
| detector | module | Yes | Loaded detector module with detect(file_path) -> List[int] |
| is_malicious | bool | Yes | Whether the file is from the malicious/ directory |
Outputs
| Name | Type | Description |
|---|---|---|
| result | Dict | Contains data_name (str), tpr or fpr (float), detect_ids (List[int]), total_num (int) |
Usage Examples
Evaluating a Single File
from pathlib import Path
from main_text import load_detector, process_file
detector = load_detector("promptguard")
result = process_file(
file_path=Path("data/text/malicious/direct_injection.jsonl"),
detector=detector,
is_malicious=True
)
print(f"Dataset: {result['data_name']}")
print(f"TPR: {result['tpr']}")
print(f"Detected {len(result['detect_ids'])} of {result['total_num']} samples")