Implementation:Vibrantlabsai Ragas MultiModalRelevance

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

MultiModalRelevance is a metric that evaluates whether an AI response to a user query is relevant and consistent with both the visual (image) and textual context information provided.

Description

This metric uses an LLM with multi-modal capabilities to assess whether the response to a query aligns with the retrieved contexts, which may contain both images and text. Unlike MultiModalFaithfulness, this metric also considers the user_input (the original question) when making its relevance determination, checking that the answer is actually responsive to what was asked.

The algorithm works as follows:

The user input, response text, and retrieved contexts are packaged into a RelevanceInput Pydantic model.
The input is sent to the configured LLM via a MultiModalRelevancePrompt that includes few-shot examples: one demonstrating a relevant answer (about Margherita pizza) and one showing an irrelevant answer (incorrect Oscar winner).
The LLM evaluates whether the response is "in line with the images and textual context information" and returns a RelevanceOutput with a boolean relevance field.
The boolean is cast to a float: 1.0 for relevant, 0.0 for irrelevant. If the LLM returns no response, the score is NaN.

The prompt instructs the model: "Your task is to evaluate if the response for the query is in line with the images and textual context information provided. You have two options to answer. Either True / False."

Usage

Use this metric when evaluating multi-modal RAG systems where both the relevance to the user question and the alignment with multi-modal context matter. It is suitable for visual question answering, document understanding, and any application that retrieves both images and text to generate answers. A pre-instantiated convenience instance is available as multimodal_relevance.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_multi_modal_relevance.py

Signature

@dataclass
class MultiModalRelevance(MetricWithLLM, SingleTurnMetric):
    name: str = "relevance_rate"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "user_input",
                "response",
                "retrieved_contexts",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    relevance_prompt: ImageTextPrompt = MultiModalRelevancePrompt()

Import

from ragas.metrics import MultiModalRelevance

I/O Contract

Inputs

Name	Type	Required	Description
user_input	str	Yes	The original user query or question
response	str	Yes	The AI-generated response to evaluate for relevance
retrieved_contexts	list[str]	Yes	The list of textual contexts retrieved from the knowledge base (images are handled by the ImageTextPrompt)

Outputs

Name	Type	Description
score	float	1.0 if the response is relevant to the query and contexts, 0.0 if not, or NaN if the LLM fails to respond

Usage Examples

Basic Usage

from ragas.metrics import MultiModalRelevance
from ragas.dataset_schema import SingleTurnSample

metric = MultiModalRelevance()
# Set up the LLM (must support multi-modal inputs)
# metric.llm = your_multimodal_llm

sample = SingleTurnSample(
    user_input="What is the primary ingredient in a traditional Margherita pizza?",
    response="The primary ingredients in a Margherita pizza are tomatoes, mozzarella cheese, and fresh basil.",
    retrieved_contexts=[
        "A traditional Margherita pizza consists of a thin crust.",
        "The main toppings include tomatoes, mozzarella cheese, fresh basil, salt, and olive oil.",
    ],
)

# score = await metric.single_turn_ascore(sample)
# score will be 1.0 (relevant) or 0.0 (irrelevant)

Using the Pre-instantiated Instance

from ragas.metrics._multi_modal_relevance import multimodal_relevance

# multimodal_relevance is a ready-to-use MultiModalRelevance instance
# multimodal_relevance.llm = your_multimodal_llm

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment