Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas MultiModalRelevance

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

MultiModalRelevance is a metric that evaluates whether an AI response to a user query is relevant and consistent with both the visual (image) and textual context information provided.

Description

This metric uses an LLM with multi-modal capabilities to assess whether the response to a query aligns with the retrieved contexts, which may contain both images and text. Unlike MultiModalFaithfulness, this metric also considers the user_input (the original question) when making its relevance determination, checking that the answer is actually responsive to what was asked.

The algorithm works as follows:

  1. The user input, response text, and retrieved contexts are packaged into a RelevanceInput Pydantic model.
  2. The input is sent to the configured LLM via a MultiModalRelevancePrompt that includes few-shot examples: one demonstrating a relevant answer (about Margherita pizza) and one showing an irrelevant answer (incorrect Oscar winner).
  3. The LLM evaluates whether the response is "in line with the images and textual context information" and returns a RelevanceOutput with a boolean relevance field.
  4. The boolean is cast to a float: 1.0 for relevant, 0.0 for irrelevant. If the LLM returns no response, the score is NaN.

The prompt instructs the model: "Your task is to evaluate if the response for the query is in line with the images and textual context information provided. You have two options to answer. Either True / False."

Usage

Use this metric when evaluating multi-modal RAG systems where both the relevance to the user question and the alignment with multi-modal context matter. It is suitable for visual question answering, document understanding, and any application that retrieves both images and text to generate answers. A pre-instantiated convenience instance is available as multimodal_relevance.

Code Reference

Source Location

Signature

@dataclass
class MultiModalRelevance(MetricWithLLM, SingleTurnMetric):
    name: str = "relevance_rate"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "response",
                "retrieved_contexts",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    relevance_prompt: ImageTextPrompt = MultiModalRelevancePrompt()

Import

from ragas.metrics import MultiModalRelevance

I/O Contract

Inputs

Name Type Required Description
user_input str Yes The original user query or question
response str Yes The AI-generated response to evaluate for relevance
retrieved_contexts list[str] Yes The list of textual contexts retrieved from the knowledge base (images are handled by the ImageTextPrompt)

Outputs

Name Type Description
score float 1.0 if the response is relevant to the query and contexts, 0.0 if not, or NaN if the LLM fails to respond

Usage Examples

Basic Usage

from ragas.metrics import MultiModalRelevance
from ragas.dataset_schema import SingleTurnSample

metric = MultiModalRelevance()
# Set up the LLM (must support multi-modal inputs)
# metric.llm = your_multimodal_llm

sample = SingleTurnSample(
    user_input="What is the primary ingredient in a traditional Margherita pizza?",
    response="The primary ingredients in a Margherita pizza are tomatoes, mozzarella cheese, and fresh basil.",
    retrieved_contexts=[
        "A traditional Margherita pizza consists of a thin crust.",
        "The main toppings include tomatoes, mozzarella cheese, fresh basil, salt, and olive oil.",
    ],
)

# score = await metric.single_turn_ascore(sample)
# score will be 1.0 (relevant) or 0.0 (irrelevant)

Using the Pre-instantiated Instance

from ragas.metrics._multi_modal_relevance import multimodal_relevance

# multimodal_relevance is a ready-to-use MultiModalRelevance instance
# multimodal_relevance.llm = your_multimodal_llm

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment