Implementation:Run llama Llama index MultiModalRetriever
| Knowledge Sources | |
|---|---|
| Domains | LLM Framework, Retrieval, Multimodal |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
MultiModalRetriever is an abstract base class that combines BaseRetriever and BaseImageRetriever to define the interface for retrievers that operate across text and image modalities.
Description
The MultiModalRetriever class inherits from both BaseRetriever and BaseImageRetriever using multiple inheritance. It defines six abstract methods that subclasses must implement, covering all combinations of input and output modalities:
- text_retrieve -- Given a text query, retrieve text nodes.
- text_to_image_retrieve -- Given a text query, retrieve image nodes (cross-modal).
- image_to_image_retrieve -- Given an image query, retrieve image nodes.
Each of these has a corresponding async variant:
- atext_retrieve
- atext_to_image_retrieve
- aimage_to_image_retrieve
All methods accept a QueryType parameter (which can be a string or QueryBundle) and return List[NodeWithScore].
This base class does not provide any concrete implementations; all six methods are purely abstract and must be implemented by subclasses that connect to multimodal vector stores or embedding models.
Usage
Subclass MultiModalRetriever when building retrieval systems that need to handle both text and image content. This is the foundation for multimodal RAG pipelines where queries can be text-based and results can span text documents and images.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/base/base_multi_modal_retriever.py
- Lines: 1-77
Signature
class MultiModalRetriever(BaseRetriever, BaseImageRetriever):
"""Multi Modal base retriever."""
@abstractmethod
def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
@abstractmethod
def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
@abstractmethod
def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
@abstractmethod
async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
@abstractmethod
async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
@abstractmethod
async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...
Import
from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| str_or_query_bundle | QueryType (str or QueryBundle) | Yes | The text or image query to retrieve against. Accepts either a plain string or a QueryBundle object. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | List[NodeWithScore] | A list of retrieved nodes (text or image) with associated relevance scores. |
Usage Examples
Basic Usage
from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.indices.query.schema import QueryType
from typing import List
class MyMultiModalRetriever(MultiModalRetriever):
def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
# Retrieve text nodes from text query
return self._search_text_index(str_or_query_bundle)
def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
# Retrieve image nodes from text query
return self._search_image_index(str_or_query_bundle)
def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
# Retrieve similar images from image query
return self._search_image_by_image(str_or_query_bundle)
async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
return self.text_retrieve(str_or_query_bundle)
async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
return self.text_to_image_retrieve(str_or_query_bundle)
async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
return self.image_to_image_retrieve(str_or_query_bundle)
def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
return self.text_retrieve(query_bundle)
# Usage
retriever = MyMultiModalRetriever()
text_results = retriever.text_retrieve("sunset over mountains")
image_results = retriever.text_to_image_retrieve("sunset over mountains")