Heuristic:Infiniflow Ragflow Hybrid Search Fallback Strategy
| Knowledge Sources | |
|---|---|
| Domains | Retrieval, Optimization |
| Last Updated | 2026-02-12 06:00 GMT |
Overview
Adaptive two-tier search fallback strategy that relaxes min_match from 0.3 to 0.1 and similarity threshold from default to 0.17 when initial hybrid search returns zero results.
Description
RAGFlow's hybrid search combines BM25 keyword matching with vector similarity in a weighted fusion (5% keyword, 95% vector). When the initial search with strict parameters returns no results, the system automatically retries with relaxed constraints: the keyword min_match drops from 0.3 (30% of terms must match) to 0.1 (10%), and the vector similarity threshold drops to 0.17. This two-tier approach ensures users receive some results even for difficult queries while maintaining quality for well-matched queries.
Usage
This heuristic is applied automatically in the `Dealer.search()` method whenever a hybrid search returns zero results. Understanding this pattern is important when debugging "why are low-quality results appearing" — it may be the fallback tier activating.
The Insight (Rule of Thumb)
- Action: When initial hybrid search with `min_match=0.3` returns `total == 0`, retry with `min_match=0.1` and `similarity=0.17`.
- Value: First tier: min_match=0.3, default similarity. Second tier: min_match=0.1, similarity=0.17.
- Trade-off: The fallback returns lower-quality matches rather than empty results. Users may see less relevant chunks when the fallback activates.
Reasoning
Empty search results are a worse user experience than slightly imprecise results. The 0.3→0.1 min_match relaxation allows partial keyword overlap, while the 0.17 similarity threshold is empirically chosen as the minimum useful similarity score. The weighted_sum fusion (0.05 keyword, 0.95 vector) already heavily favors semantic matching, so the keyword relaxation primarily affects edge cases where keyword filters are overly restrictive.
Code Evidence from `rag/nlp/search.py:114-147`:
matchText, keywords = self.qryr.question(qst, min_match=0.3)
# ... initial search ...
if total == 0:
if filters.get("doc_id"):
res = await thread_pool_exec(self.dataStore.search, src, [], filters, [],
orderBy, offset, limit, idx_names, kb_ids)
else:
matchText, _ = self.qryr.question(qst, min_match=0.1)
matchDense.extra_options["similarity"] = 0.17
res = await thread_pool_exec(self.dataStore.search, src, highlightFields,
filters, [matchText, matchDense, fusionExpr],
orderBy, offset, limit, idx_names, kb_ids,
rank_feature=rank_feature)
Fusion weights from `rag/nlp/search.py:127`:
fusionExpr = FusionExpr("weighted_sum", topk, {"weights": "0.05,0.95"})