Implementation:Run llama Llama index MultiModalRetriever

Knowledge Sources	Run_llama_Llama_index
Domains	LLM Framework, Retrieval, Multimodal
Last Updated	2026-02-11 19:00 GMT

Overview

MultiModalRetriever is an abstract base class that combines BaseRetriever and BaseImageRetriever to define the interface for retrievers that operate across text and image modalities.

Description

The MultiModalRetriever class inherits from both BaseRetriever and BaseImageRetriever using multiple inheritance. It defines six abstract methods that subclasses must implement, covering all combinations of input and output modalities:

text_retrieve -- Given a text query, retrieve text nodes.
text_to_image_retrieve -- Given a text query, retrieve image nodes (cross-modal).
image_to_image_retrieve -- Given an image query, retrieve image nodes.

Each of these has a corresponding async variant:

atext_retrieve
atext_to_image_retrieve
aimage_to_image_retrieve

All methods accept a QueryType parameter (which can be a string or QueryBundle) and return List[NodeWithScore].

This base class does not provide any concrete implementations; all six methods are purely abstract and must be implemented by subclasses that connect to multimodal vector stores or embedding models.

Usage

Subclass MultiModalRetriever when building retrieval systems that need to handle both text and image content. This is the foundation for multimodal RAG pipelines where queries can be text-based and results can span text documents and images.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/base/base_multi_modal_retriever.py
Lines: 1-77

Signature

class MultiModalRetriever(BaseRetriever, BaseImageRetriever):
    """Multi Modal base retriever."""

    @abstractmethod
    def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

Import

from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever

I/O Contract

Inputs

Name	Type	Required	Description
str_or_query_bundle	QueryType (str or QueryBundle)	Yes	The text or image query to retrieve against. Accepts either a plain string or a QueryBundle object.

Outputs

Name	Type	Description
return	List[NodeWithScore]	A list of retrieved nodes (text or image) with associated relevance scores.

Usage Examples

Basic Usage

from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.indices.query.schema import QueryType
from typing import List

class MyMultiModalRetriever(MultiModalRetriever):
    def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve text nodes from text query
        return self._search_text_index(str_or_query_bundle)

    def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve image nodes from text query
        return self._search_image_index(str_or_query_bundle)

    def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve similar images from image query
        return self._search_image_by_image(str_or_query_bundle)

    async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.text_retrieve(str_or_query_bundle)

    async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.text_to_image_retrieve(str_or_query_bundle)

    async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.image_to_image_retrieve(str_or_query_bundle)

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        return self.text_retrieve(query_bundle)

# Usage
retriever = MyMultiModalRetriever()
text_results = retriever.text_retrieve("sunset over mountains")
image_results = retriever.text_to_image_retrieve("sunset over mountains")

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment