Heuristic:Marker Inc Korea AutoRAG Empty Result Fallback

Knowledge Sources	AutoRAG
Domains	Robustness, RAG, Pipeline_Design
Last Updated	2026-02-12 00:00 GMT

Overview

Pipeline robustness pattern ensuring that filtering and selection operations never return empty results, falling back to the original input when all candidates are filtered out.

Description

AutoRAG implements a defensive programming pattern across its filtering and selection modules: if a filtering operation removes all candidates (e.g., all passages fall below a similarity threshold), the system reverts to the pre-filter results rather than propagating an empty set downstream. This is implemented via the `avoid_empty_result` decorator in `autorag/strategy.py` and explicitly in passage filter modules like `SimilarityThresholdCutoff`. The principle is that a suboptimal result is always better than no result in a RAG pipeline, because downstream generation modules require at least some context to produce a response.

Usage

This heuristic is automatically applied by AutoRAG's infrastructure. No manual configuration is needed. Be aware of this behavior when debugging: if your filters appear to have no effect, they may be triggering the fallback because they are too aggressive.

The Insight (Rule of Thumb)

Action: Never return empty results from any filtering or selection step in a RAG pipeline.
Value: When all candidates are filtered out, return the original (pre-filter) input.
Variant: When using threshold-based filters (SimilarityThresholdCutoff), keep at least the single highest-scoring result even if it falls below the threshold.
Trade-off: May include low-quality passages in the final context, but prevents pipeline crashes and ensures the LLM always receives some context to generate from.

Reasoning

RAG pipelines are multi-stage: retrieval produces candidates, filters narrow them, and generation creates responses. If any intermediate stage produces empty results, the entire pipeline fails. The cost of including a few low-quality passages (slightly worse generation) is much lower than the cost of a pipeline failure (no response at all). This pattern is especially important during automated optimization trials where many parameter combinations are tested; aggressive filter settings would cause trials to crash rather than produce measurable (if suboptimal) results.

Code Evidence

The `avoid_empty_result` decorator from `autorag/strategy.py:19-47`:

def avoid_empty_result(return_index: List[int]):
    """
    Decorator for avoiding empty results from the function.
    When the func returns an empty result, it will return the origin results.
    When the func returns a None, it will return the origin results.
    When the return value is a tuple, it will check all the value or list is empty.
    If so, it will return the origin results.
    It keeps parameters at return_index of the function as the origin results.
    """
    def decorator_avoid_empty_result(func: Callable):
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> List:
            func_result = func(*args, **kwargs)
            if isinstance(func_result, tuple):
                if all([not bool(result) for result in func_result]):
                    return [args[index] for index in return_index]
            if not bool(func_result):
                return [args[index] for index in return_index]
            else:
                return func_result
        return wrapper
    return decorator_avoid_empty_result

Keep-best-one fallback in SimilarityThresholdCutoff from `autorag/nodes/passagefilter/similarity_threshold_cutoff.py`:

# If all contents are filtered, keep the only one highest similarity content.
if len(result) > 0:
    return result
return [np.argmax(similarities)]  # Keep best if all filtered

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment