Principle:Langgenius Dify Retrieval Configuration
| Knowledge Sources | |
|---|---|
| Domains | RAG Information Retrieval Search Configuration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Retrieval configuration defines the strategy for finding and ranking relevant document chunks at query time in a RAG system, encompassing search method selection, reranking, top-k limits, score thresholds, and weighted scoring.
Description
A knowledge base is only as useful as its ability to surface the right content when queried. Retrieval configuration is the set of parameters that controls how the system searches the index and which results are returned to the language model's context window.
The configuration surface includes:
- Search method -- semantic, full-text, or hybrid search, determining the underlying retrieval algorithm.
- Reranking -- an optional second-pass model that re-scores initial retrieval results for higher precision.
- Top-k -- the maximum number of segments returned to the caller.
- Score threshold -- a minimum relevance score below which results are discarded.
- Weighted scoring -- when using hybrid search, the relative weights assigned to semantic and keyword scores.
These parameters interact with each other. For example, enabling reranking changes the effective ranking order, which changes which results fall above or below the score threshold. Operators must understand these interactions to tune retrieval effectively.
Usage
Configure retrieval settings when:
- Setting up a new knowledge base -- default retrieval configuration is applied during creation but can be customized.
- Tuning retrieval quality -- after observing poor results in hit testing, an operator adjusts top-k, thresholds, or reranking settings.
- Switching search methods -- moving from semantic to hybrid search (or vice versa) to better match the query patterns of the application.
- Integrating a reranking model -- enabling a cross-encoder reranker to improve precision at the cost of additional latency.
Theoretical Basis
Search Methods in Depth
Semantic Search
Semantic search relies entirely on vector similarity:
query_vector = embed(query_text)
results = vector_db.search(query_vector, top_k=k)
Strengths: captures meaning, handles paraphrases. Weaknesses: can miss exact keyword matches; sensitive to embedding model quality.
Full-Text Search
Full-text search uses lexical matching with BM25 or similar algorithms:
results = inverted_index.search(tokenize(query_text), top_k=k)
Strengths: fast, deterministic, excellent for exact terms. Weaknesses: no semantic understanding; vocabulary mismatch causes recall loss.
Hybrid Search
Hybrid search combines both signals. The combination can use several fusion strategies:
semantic_results = vector_db.search(query_vector, top_k=k)
keyword_results = inverted_index.search(tokens, top_k=k)
# Weighted score fusion
for each result in union(semantic_results, keyword_results):
score = w_semantic * semantic_score + w_keyword * keyword_score
# Or reciprocal rank fusion (RRF)
for each result:
rrf_score = sum(1 / (rank_constant + rank_in_list) for each list containing result)
Reranking
Reranking is a two-stage retrieval pattern:
Stage 1: Retrieve top-N candidates (N >> k) using fast first-stage retrieval
Stage 2: Score each candidate using a cross-encoder reranking model
Stage 3: Sort by reranker score, return top-k
Cross-encoder rerankers are more accurate than bi-encoder similarity because they attend to the query and document jointly, but they are also much slower (O(N) model calls vs. one embedding call + index lookup). This is why reranking is applied to a small candidate set rather than the entire corpus.
Dify supports two reranking modes:
| Mode | Mechanism | When to Use |
|---|---|---|
| Reranking Model | A dedicated cross-encoder model (e.g., Cohere Rerank, BGE Reranker) re-scores candidates | When maximum precision is needed and latency budget allows |
| Weighted Score | Adjusts the blend of semantic and keyword scores without an external model | When a reranking model is unavailable or latency is critical |
Weighted Score Presets
For hybrid search with weighted scoring, Dify provides preset weight configurations:
| Preset | Semantic Weight | Keyword Weight | Use Case |
|---|---|---|---|
| SemanticFirst | 0.7 | 0.3 | Natural-language queries predominate |
| KeywordFirst | 0.3 | 0.7 | Technical or ID-based queries predominate |
| Customized | user-defined | user-defined | Domain-specific tuning |
Top-k and Score Threshold
- Top-k controls the maximum number of segments returned. Lower values reduce noise but risk missing relevant content. Typical range: 1--10.
- Score threshold acts as a quality gate. Results below the threshold are discarded even if fewer than k results remain. This prevents low-quality segments from polluting the context window.
candidates = retrieve(query, top_k=k)
if score_threshold_enabled:
candidates = [c for c in candidates if c.score >= score_threshold]
return candidates
Configuration Interaction Matrix
| Setting | Affects |
|---|---|
| search_method | Which index types are queried |
| reranking_enable | Whether a second pass re-scores results |
| reranking_model | Which model performs reranking (if enabled) |
| top_k | Upper bound on result count |
| score_threshold | Lower bound on result quality |
| weighted_score | Balance between semantic and keyword signals (hybrid only) |