Principle:Norrrrrrr lyn WAInjectBench Ensemble Aggregation Text

Knowledge Sources	Ensemble Methods in ML
Domains	Ensemble_Learning, NLP, Security
Last Updated	2026-02-14 16:00 GMT

Overview

A union-based ensemble strategy that combines text detection results from multiple detectors by merging their flagged IDs to maximize recall.

Description

Ensemble Aggregation in the text detection pipeline takes the results from all individual text detectors and combines them using set union. If any detector flags a sample, the ensemble marks it as detected. This approach maximizes the True Positive Rate at the potential cost of increased False Positive Rate — a reasonable trade-off in security applications where missing a genuine attack is more costly than a false alarm.

The ensemble reads all per-detector JSONL result files from the result directory, groups results by dataset name, unions the detect_ids sets, and recomputes TPR/FPR for the combined detection.

Usage

Use this as the final aggregation step after all individual text detectors have been run. The ensemble requires that individual detector results already exist as JSONL files in the result directory.

Theoretical Basis

Union ensemble rule:

$D_{e n s e m b l e} = D_{1} \cup D_{2} \cup \dots \cup D_{n}$

Where $D_{i}$ is the set of IDs flagged by detector $i$ .

# Union ensemble algorithm
for each detector_result_file:
    for each entry:
        ensemble[data_name].detect_ids |= entry.detect_ids
rate = len(ensemble[data_name].detect_ids) / total_num

This guarantees that $T P R_{e n s e m b l e} \geq \max (T P R_{i})$ — the ensemble recall is at least as good as the best individual detector.

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_text_ensemble_detect

Uses Heuristic

Heuristic:Norrrrrrr_lyn_WAInjectBench_Union_Ensemble_Maximize_Recall

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment