Principle:Norrrrrrr lyn WAInjectBench Ensemble Aggregation Text
| Knowledge Sources | |
|---|---|
| Domains | Ensemble_Learning, NLP, Security |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A union-based ensemble strategy that combines text detection results from multiple detectors by merging their flagged IDs to maximize recall.
Description
Ensemble Aggregation in the text detection pipeline takes the results from all individual text detectors and combines them using set union. If any detector flags a sample, the ensemble marks it as detected. This approach maximizes the True Positive Rate at the potential cost of increased False Positive Rate — a reasonable trade-off in security applications where missing a genuine attack is more costly than a false alarm.
The ensemble reads all per-detector JSONL result files from the result directory, groups results by dataset name, unions the detect_ids sets, and recomputes TPR/FPR for the combined detection.
Usage
Use this as the final aggregation step after all individual text detectors have been run. The ensemble requires that individual detector results already exist as JSONL files in the result directory.
Theoretical Basis
Union ensemble rule:
Where is the set of IDs flagged by detector .
# Union ensemble algorithm
for each detector_result_file:
for each entry:
ensemble[data_name].detect_ids |= entry.detect_ids
rate = len(ensemble[data_name].detect_ids) / total_num
This guarantees that — the ensemble recall is at least as good as the best individual detector.