Principle:Marker Inc Korea AutoRAG Strategy Selection
| Knowledge Sources | |
|---|---|
| Domains | Evaluation Strategy, RAG Pipeline Optimization |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Strategy selection chooses the best-performing module from multiple candidates by aggregating evaluation metrics using one of several ranking strategies.
Description
When multiple module candidates are evaluated at a given pipeline node, the system must decide which candidate performed best. This decision is non-trivial when multiple metrics are involved, because a module that excels on one metric may underperform on another. Strategy selection provides a principled framework for resolving this multi-objective comparison into a single winner.
AutoRAG supports three aggregation strategies. The mean strategy computes the average of all metric values across all QA samples for each module and selects the module with the highest overall average. The reciprocal rank (rank) strategy ranks modules independently on each metric, computes the reciprocal of each rank, sums these reciprocal ranks across metrics, and selects the module with the highest total. The normalized mean (normalize_mean) strategy applies min-max normalization to each metric across all candidates before averaging, which prevents metrics with larger numerical ranges from dominating the selection.
An optional speed threshold mechanism can pre-filter candidates by execution time before the strategy is applied, ensuring that the selected module meets latency requirements in addition to quality requirements.
Usage
Strategy selection is used at every node during an optimization trial. The strategy is specified per-node in the YAML configuration, allowing different nodes to use different selection criteria. For instance, a retrieval node might use reciprocal rank to balance precision and recall, while a generation node might use simple mean on a single quality metric.
Theoretical Basis
Mean Strategy
For each candidate module i with result DataFrame R_i and metric columns C:
score_i = mean( mean(R_i[c]) for c in C )
best = argmax(score_i for all i)
This is equivalent to computing the grand mean of all metric values across all rows and all metric columns. It treats all metrics and all QA samples equally.
Reciprocal Rank Strategy
For each metric column c, rank all candidates by their average value on that metric. Then for each candidate i:
FOR EACH metric c:
avg_i_c = mean(R_i[c])
rank_i_c = rank of avg_i_c among all candidates (1 = best)
score_i = sum( 1 / rank_i_c for c in C )
best = argmax(score_i for all i)
This strategy is more robust to outlier metrics because it only considers relative ordering rather than absolute values. A module that ranks first on every metric receives a perfect score regardless of the magnitude of its advantage.
Normalized Mean Strategy
Apply min-max normalization to each metric across all candidates, then average:
FOR EACH metric c:
avg_i_c = mean(R_i[c])
norm_i_c = (avg_i_c - min(avg_c)) / (max(avg_c) - min(avg_c))
score_i = sum( norm_i_c for c in C )
best = argmax(score_i for all i)
This approach rescales every metric to the [0, 1] range before summing, ensuring that no single metric dominates due to its numerical scale. It preserves the proportional differences between candidates within each metric, unlike the rank-based strategy.
Speed Threshold Filtering
Before applying any selection strategy, an optional speed threshold can filter out candidates whose execution time exceeds a specified limit:
FUNCTION filter_by_threshold(results, execution_times, threshold):
RETURN [r for r, t in zip(results, execution_times) if t <= threshold]
If all candidates exceed the threshold, the filter is a no-op (all candidates are retained) to avoid empty selections.