Heuristic:Marker Inc Korea AutoRAG Empty Result Fallback
| Knowledge Sources | |
|---|---|
| Domains | Robustness, RAG, Pipeline_Design |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Pipeline robustness pattern ensuring that filtering and selection operations never return empty results, falling back to the original input when all candidates are filtered out.
Description
AutoRAG implements a defensive programming pattern across its filtering and selection modules: if a filtering operation removes all candidates (e.g., all passages fall below a similarity threshold), the system reverts to the pre-filter results rather than propagating an empty set downstream. This is implemented via the `avoid_empty_result` decorator in `autorag/strategy.py` and explicitly in passage filter modules like `SimilarityThresholdCutoff`. The principle is that a suboptimal result is always better than no result in a RAG pipeline, because downstream generation modules require at least some context to produce a response.
Usage
This heuristic is automatically applied by AutoRAG's infrastructure. No manual configuration is needed. Be aware of this behavior when debugging: if your filters appear to have no effect, they may be triggering the fallback because they are too aggressive.
The Insight (Rule of Thumb)
- Action: Never return empty results from any filtering or selection step in a RAG pipeline.
- Value: When all candidates are filtered out, return the original (pre-filter) input.
- Variant: When using threshold-based filters (SimilarityThresholdCutoff), keep at least the single highest-scoring result even if it falls below the threshold.
- Trade-off: May include low-quality passages in the final context, but prevents pipeline crashes and ensures the LLM always receives some context to generate from.
Reasoning
RAG pipelines are multi-stage: retrieval produces candidates, filters narrow them, and generation creates responses. If any intermediate stage produces empty results, the entire pipeline fails. The cost of including a few low-quality passages (slightly worse generation) is much lower than the cost of a pipeline failure (no response at all). This pattern is especially important during automated optimization trials where many parameter combinations are tested; aggressive filter settings would cause trials to crash rather than produce measurable (if suboptimal) results.
Code Evidence
The `avoid_empty_result` decorator from `autorag/strategy.py:19-47`:
def avoid_empty_result(return_index: List[int]):
"""
Decorator for avoiding empty results from the function.
When the func returns an empty result, it will return the origin results.
When the func returns a None, it will return the origin results.
When the return value is a tuple, it will check all the value or list is empty.
If so, it will return the origin results.
It keeps parameters at return_index of the function as the origin results.
"""
def decorator_avoid_empty_result(func: Callable):
@functools.wraps(func)
def wrapper(*args, **kwargs) -> List:
func_result = func(*args, **kwargs)
if isinstance(func_result, tuple):
if all([not bool(result) for result in func_result]):
return [args[index] for index in return_index]
if not bool(func_result):
return [args[index] for index in return_index]
else:
return func_result
return wrapper
return decorator_avoid_empty_result
Keep-best-one fallback in SimilarityThresholdCutoff from `autorag/nodes/passagefilter/similarity_threshold_cutoff.py`:
# If all contents are filtered, keep the only one highest similarity content.
if len(result) > 0:
return result
return [np.argmax(similarities)] # Keep best if all filtered