Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index MultiModalRetriever

From Leeroopedia
Knowledge Sources
Domains LLM Framework, Retrieval, Multimodal
Last Updated 2026-02-11 19:00 GMT

Overview

MultiModalRetriever is an abstract base class that combines BaseRetriever and BaseImageRetriever to define the interface for retrievers that operate across text and image modalities.

Description

The MultiModalRetriever class inherits from both BaseRetriever and BaseImageRetriever using multiple inheritance. It defines six abstract methods that subclasses must implement, covering all combinations of input and output modalities:

  • text_retrieve -- Given a text query, retrieve text nodes.
  • text_to_image_retrieve -- Given a text query, retrieve image nodes (cross-modal).
  • image_to_image_retrieve -- Given an image query, retrieve image nodes.

Each of these has a corresponding async variant:

  • atext_retrieve
  • atext_to_image_retrieve
  • aimage_to_image_retrieve

All methods accept a QueryType parameter (which can be a string or QueryBundle) and return List[NodeWithScore].

This base class does not provide any concrete implementations; all six methods are purely abstract and must be implemented by subclasses that connect to multimodal vector stores or embedding models.

Usage

Subclass MultiModalRetriever when building retrieval systems that need to handle both text and image content. This is the foundation for multimodal RAG pipelines where queries can be text-based and results can span text documents and images.

Code Reference

Source Location

  • Repository: Run_llama_Llama_index
  • File: llama-index-core/llama_index/core/base/base_multi_modal_retriever.py
  • Lines: 1-77

Signature

class MultiModalRetriever(BaseRetriever, BaseImageRetriever):
    """Multi Modal base retriever."""

    @abstractmethod
    def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

    @abstractmethod
    async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]: ...

Import

from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever

I/O Contract

Inputs

Name Type Required Description
str_or_query_bundle QueryType (str or QueryBundle) Yes The text or image query to retrieve against. Accepts either a plain string or a QueryBundle object.

Outputs

Name Type Description
return List[NodeWithScore] A list of retrieved nodes (text or image) with associated relevance scores.

Usage Examples

Basic Usage

from llama_index.core.base.base_multi_modal_retriever import MultiModalRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from llama_index.core.indices.query.schema import QueryType
from typing import List

class MyMultiModalRetriever(MultiModalRetriever):
    def text_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve text nodes from text query
        return self._search_text_index(str_or_query_bundle)

    def text_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve image nodes from text query
        return self._search_image_index(str_or_query_bundle)

    def image_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        # Retrieve similar images from image query
        return self._search_image_by_image(str_or_query_bundle)

    async def atext_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.text_retrieve(str_or_query_bundle)

    async def atext_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.text_to_image_retrieve(str_or_query_bundle)

    async def aimage_to_image_retrieve(self, str_or_query_bundle: QueryType) -> List[NodeWithScore]:
        return self.image_to_image_retrieve(str_or_query_bundle)

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        return self.text_retrieve(query_bundle)

# Usage
retriever = MyMultiModalRetriever()
text_results = retriever.text_retrieve("sunset over mountains")
image_results = retriever.text_to_image_retrieve("sunset over mountains")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment