Implementation:Run llama Llama index QueryFusionRetriever
| Knowledge Sources | |
|---|---|
| Domains | Retrieval, Fusion, RAG |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
Implements a multi-query fusion retriever that generates multiple query variations using an LLM, retrieves results from multiple retrievers, and fuses the results using configurable ranking strategies such as reciprocal rank fusion, relative score fusion, distance-based score fusion, or simple max-score fusion.
Description
The fusion_retriever.py module provides the QueryFusionRetriever class, which extends BaseRetriever to implement a multi-query retrieval strategy with result fusion:
The FUSION_MODES enum defines four fusion strategies:
- RECIPROCAL_RANK ("reciprocal_rerank") applies reciprocal rank fusion (RRF) as described in Cormack et al., 2009. It uses a parameter k=60 to dampen the impact of outlier rankings. Each node accumulates a score of 1.0 / (rank + k) across all result sets, and nodes are sorted by their fused score.
- RELATIVE_SCORE ("relative_score") applies MinMax normalization to each result set's scores (scaling to 0-1 range), then weights by retriever weight and divides by the number of queries. Duplicate nodes have their scores summed.
- DIST_BASED_SCORE ("dist_based_score") is a variant of relative score fusion that uses mean and standard deviation (3 sigma range) instead of true min/max for normalization, making it more robust to outliers.
- SIMPLE ("simple") de-duplicates nodes across all result sets and keeps the maximum score for each unique node.
The QueryFusionRetriever constructor accepts:
- A list of retrievers to query in parallel.
- An optional llm for generating query variations (defaults to Settings.llm).
- A query_gen_prompt template for generating alternative queries (defaults to QUERY_GEN_PROMPT which asks the LLM to generate num_queries related search queries).
- num_queries controlling how many total queries to use (including the original; default: 4).
- retriever_weights for assigning relative importance to each retriever (normalized to sum to 1).
- use_async flag (default: True) to run retrieval tasks concurrently.
- similarity_top_k to limit the final fused result count.
The _get_queries method uses the LLM to generate num_queries - 1 additional query variations from the original query. The _retrieve method orchestrates the full pipeline: generate queries, run all retrievers (sync or async), apply the selected fusion mode, and return the top-k results. The _aretrieve method provides a fully async implementation.
Query execution supports three modes: _run_nested_async_queries (uses run_async_tasks for nested event loop compatibility), _run_async_queries (pure asyncio.gather), and _run_sync_queries (sequential execution).
Usage
Use QueryFusionRetriever when you want to improve retrieval recall by querying from multiple angles and combining results. It is particularly effective when combining different retriever types (e.g., vector search and keyword search) or when a single query may not capture all relevant aspects of the user's information need. The reciprocal rank fusion mode is recommended as a general-purpose default.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/retrievers/fusion_retriever.py
- Lines: 1-304
Signature
class FUSION_MODES(str, Enum):
RECIPROCAL_RANK = "reciprocal_rerank"
RELATIVE_SCORE = "relative_score"
DIST_BASED_SCORE = "dist_based_score"
SIMPLE = "simple"
class QueryFusionRetriever(BaseRetriever):
def __init__(
self,
retrievers: List[BaseRetriever],
llm: Optional[LLMType] = None,
query_gen_prompt: Optional[str] = None,
mode: FUSION_MODES = FUSION_MODES.SIMPLE,
similarity_top_k: int = DEFAULT_SIMILARITY_TOP_K,
num_queries: int = 4,
use_async: bool = True,
verbose: bool = False,
callback_manager: Optional[CallbackManager] = None,
objects: Optional[List[IndexNode]] = None,
object_map: Optional[dict] = None,
retriever_weights: Optional[List[float]] = None,
) -> None: ...
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: ...
async def _aretrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: ...
Import
from llama_index.core.retrievers.fusion_retriever import QueryFusionRetriever
from llama_index.core.retrievers.fusion_retriever import FUSION_MODES
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| retrievers | List[BaseRetriever] | Yes | List of retriever instances to query and fuse results from |
| llm | Optional[LLMType] | No | LLM used to generate query variations; defaults to Settings.llm |
| query_gen_prompt | Optional[str] | No | Custom prompt template for generating query variations; must include {num_queries} and {query} variables |
| mode | FUSION_MODES | No | Fusion strategy to apply (default: SIMPLE) |
| similarity_top_k | int | No | Maximum number of results to return after fusion (default: DEFAULT_SIMILARITY_TOP_K) |
| num_queries | int | No | Total number of queries including the original (default: 4) |
| use_async | bool | No | Whether to run retrieval tasks concurrently (default: True) |
| verbose | bool | No | Whether to print generated queries (default: False) |
| callback_manager | Optional[CallbackManager] | No | Callback manager for event hooks |
| retriever_weights | Optional[List[float]] | No | Relative weights for each retriever; normalized to sum to 1.0 |
Outputs
| Name | Type | Description |
|---|---|---|
| _retrieve() | List[NodeWithScore] | Fused and ranked list of retrieved nodes, truncated to similarity_top_k |
| _aretrieve() | List[NodeWithScore] | Async version returning the same fused and ranked node list |
Usage Examples
Basic Usage
from llama_index.core.retrievers.fusion_retriever import (
QueryFusionRetriever,
FUSION_MODES,
)
from llama_index.core import VectorStoreIndex
# Create two indexes with different configurations
index1 = VectorStoreIndex.from_documents(documents)
index2 = VectorStoreIndex.from_documents(documents)
retriever1 = index1.as_retriever(similarity_top_k=5)
retriever2 = index2.as_retriever(similarity_top_k=5)
# Create fusion retriever with reciprocal rank fusion
fusion_retriever = QueryFusionRetriever(
retrievers=[retriever1, retriever2],
mode=FUSION_MODES.RECIPROCAL_RANK,
similarity_top_k=10,
num_queries=4,
use_async=True,
)
# Retrieve fused results
nodes = fusion_retriever.retrieve("What is machine learning?")
With Custom Retriever Weights
fusion_retriever = QueryFusionRetriever(
retrievers=[vector_retriever, keyword_retriever],
mode=FUSION_MODES.RELATIVE_SCORE,
retriever_weights=[0.7, 0.3],
num_queries=3,
similarity_top_k=5,
)
Disabling Query Generation
# Use num_queries=1 to skip LLM-based query generation
fusion_retriever = QueryFusionRetriever(
retrievers=[retriever1, retriever2],
mode=FUSION_MODES.SIMPLE,
num_queries=1, # Only use the original query
)