Heuristic:Marker Inc Korea AutoRAG Module Selection Strategies

Knowledge Sources	AutoRAG
Domains	Optimization, RAG
Last Updated	2026-02-08 06:00 GMT

Overview

Decision framework for choosing between average, reciprocal rank, and normalized mean strategies when selecting the best RAG module per node.

Description

AutoRAG evaluates multiple module implementations for each pipeline node and selects the best one using a configurable strategy. Three strategies are available: `average` (mean of metric columns), `rr` (reciprocal rank fusion across metrics), and `normalize_mean` (min-max normalized mean). Each strategy has different strengths depending on whether metrics have different scales, whether you want to avoid single-metric dominance, or whether simplicity is preferred.

Usage

Apply this heuristic when authoring the `strategy` section of a YAML pipeline config. The strategy determines which module wins for each node. Choose based on your metric characteristics and evaluation goals.

The Insight (Rule of Thumb)

`average` (default): Takes the mean of all specified metric columns per module, then selects the module with the highest overall mean.
- Best for: Simple evaluation with metrics on similar scales (e.g., all between 0 and 1).
- Trade-off: Can be dominated by a single high-scale metric if metrics are not normalized.

`rr` (reciprocal rank): Ranks modules by each metric separately, converts to reciprocal values (1st=1, 2nd=0.5, 3rd=0.33...), sums across metrics.
- Best for: Robust selection that avoids single-metric dominance. Good when you care about consistent performance across all metrics.
- Trade-off: Ignores actual score magnitudes; a module barely better gets the same rank bonus as one significantly better.

`normalize_mean`: Min-max normalizes each metric to [0,1] range, then takes the mean.
- Best for: When metrics have different scales (e.g., precision 0-1 vs. latency in seconds).
- Trade-off: Sensitive to outliers that skew the min-max range.

`speed_threshold`: Optional filter (works with any strategy). Excludes modules slower than the threshold in seconds.
- Value: Use stricter thresholds for filtering nodes (5s) than expansion nodes (10s) since filters should be fast.

Reasoning

The three strategies encode different assumptions about what "best" means:

Average is the simplest and most intuitive. It works well when all metrics are on the same scale (typical for retrieval metrics like F1, recall, precision which are all 0-1). However, if you mix metrics with different scales, the higher-scale metric will dominate.

Reciprocal Rank is borrowed from information retrieval (RRF). It only cares about relative ordering, not absolute scores. This makes it robust against metric scale differences and prevents any single metric from dominating. It is the recommended choice when combining heterogeneous metrics.

Normalize Mean bridges the gap: it respects absolute score differences (unlike RR) while handling scale differences (unlike average). However, it is sensitive to outlier modules that stretch the normalization range.

Code Evidence

Average strategy in `autorag/strategy.py:114-135`:

def select_best_average(
    results: List[pd.DataFrame],
    columns: Iterable[str],
    metadatas: Optional[List[Any]] = None,
) -> Tuple[pd.DataFrame, Any]:
    each_average = [df[columns].mean(axis=1).mean() for df in results]
    best_index = each_average.index(max(each_average))
    return results[best_index], metadatas[best_index]

Reciprocal rank strategy in `autorag/strategy.py:138-150`:

def select_best_rr(
    results: List[pd.DataFrame],
    columns: Iterable[str],
    metadatas: Optional[List[Any]] = None,
) -> Tuple[pd.DataFrame, Any]:
    each_average_df = pd.DataFrame(
        [df[columns].mean(axis=0).to_dict() for df in results]
    )
    rank_df = each_average_df.rank(ascending=False)
    rr_df = rank_df.map(lambda x: 1 / x)
    best_index = np.array(rr_df.sum(axis=1)).argmax()
    return results[best_index], metadatas[best_index]

Normalize mean strategy in `autorag/strategy.py:153-165`:

def select_normalize_mean(
    results: List[pd.DataFrame],
    columns: Iterable[str],
    metadatas: Optional[List[Any]] = None,
) -> Tuple[pd.DataFrame, Any]:
    each_mean_df = pd.DataFrame(
        [df[columns].mean(axis=0).to_dict() for df in results]
    )
    normalized_means = (each_mean_df - each_mean_df.min()) / (
        each_mean_df.max() - each_mean_df.min()
    )
    normalized_mean_sums = normalized_means.sum(axis=1)
    best_index = normalized_mean_sums.argmax()
    return results[best_index], metadatas[best_index]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment