Heuristic:Norrrrrrr lyn WAInjectBench Union Ensemble Maximize Recall
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Security, Classification |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Combining detection results from multiple detectors using set union of detected IDs to maximize recall (TPR) at the cost of potentially higher false positive rate (FPR).
Description
Both the text and image ensemble detectors use an identical strategy: they read all individual detector JSONL result files from a results directory and combine the `detect_ids` lists using set union. This means if any detector flags a sample as malicious, the ensemble flags it too. The ensemble then recomputes TPR/FPR based on the merged set. This is a "pessimistic" or "security-first" approach where the cost of missing an attack is considered higher than the cost of a false alarm.
Usage
Use this heuristic when combining multiple detection methods and when recall (catching all attacks) is more important than precision (avoiding false alarms). This is the standard approach for security detection benchmarks where a missed attack is worse than a false positive.
The Insight (Rule of Thumb)
- Action: Aggregate `detect_ids` across all detectors using set union.
- Value: If detector A flags samples {1, 3, 5} and detector B flags {2, 3, 7}, the ensemble flags {1, 2, 3, 5, 7}.
- Trade-off: Maximizes TPR (all attacks caught by any detector are caught by ensemble) but can increase FPR (false positives from any detector propagate). The alternative (intersection) would maximize precision but miss attacks caught by only one detector.
- Self-exclusion: The ensemble explicitly skips its own previous output file (`ensemble.jsonl`) to avoid double-counting.
Reasoning
In a benchmark setting, the goal is to measure the upper bound of detection capability when combining methods. Union aggregation provides this ceiling. In production security systems, the same rationale applies: it is better to have a human review a false positive than to let an actual attack through undetected.
The choice of union over voting or intersection also simplifies the aggregation logic and makes the ensemble robust to individual detector failures (a detector returning empty results does not reduce the ensemble's recall).
Code Evidence
Union aggregation from `detector_text/ensemble.py:26` (identical in `detector_image/ensemble.py:26`):
data_map[data_name]["detect_ids"].update(entry.get("detect_ids", []))
The `data_map` uses a `defaultdict` with `set()` for `detect_ids`, and `.update()` performs set union:
data_map = defaultdict(lambda: {"detect_ids": set(), "total_num": 0, "is_malicious": None})
Self-exclusion of previous ensemble output from `detector_text/ensemble.py:18`:
if file.name == "ensemble.jsonl": # skip old ensemble outputs
continue
TPR/FPR recomputation on merged results from `detector_text/ensemble.py:44-48`:
if total_num > 0:
if is_malicious:
rate_key, rate_value = "tpr", round(len(detect_ids) / total_num, 4)
else:
rate_key, rate_value = "fpr", round(len(detect_ids) / total_num, 4)