Implementation:Vibrantlabsai Ragas MultiModalFaithfulness

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

MultiModalFaithfulness is a metric that evaluates whether a response is faithfully supported by both the visual (image) and textual context information provided.

Description

This metric leverages an LLM with multi-modal capabilities to determine whether a given response is supported by the retrieved contexts, which may include both images and text. The metric uses an ImageTextPrompt to send the response along with the retrieved contexts to a vision-language model, which returns a boolean verdict indicating whether the information is faithful.

The algorithm works as follows:

The response text and retrieved contexts are packaged into a FaithfulnessInput Pydantic model.
The input is sent to the configured LLM via a MultiModalFaithfulnessPrompt, which includes few-shot examples demonstrating faithful and unfaithful responses about apple pie.
The LLM returns a FaithfulnessOutput containing a boolean faithful field.
The boolean is cast to a float: 1.0 for faithful, 0.0 for unfaithful. If the LLM returns no response, the score is NaN.

The prompt instructs the model: "Please tell if a given piece of information is supported by the visual as well as textual context information. You need to answer with either True or False. Answer True if any of the image(s) and textual context supports the information."

Usage

Use this metric when evaluating RAG systems that retrieve multi-modal content (images and text). It is particularly useful for applications that combine visual and textual evidence, such as document understanding, visual question answering, or multi-modal knowledge bases. A pre-instantiated convenience instance is available as multimodal_faithness.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_multi_modal_faithfulness.py

Signature

@dataclass
class MultiModalFaithfulness(MetricWithLLM, SingleTurnMetric):
    name: str = "faithful_rate"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {
                "response",
                "retrieved_contexts",
            }
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
    faithfulness_prompt: ImageTextPrompt = MultiModalFaithfulnessPrompt()

Import

from ragas.metrics import MultiModalFaithfulness

I/O Contract

Inputs

Name	Type	Required	Description
response	str	Yes	The AI-generated response to evaluate for faithfulness
retrieved_contexts	list[str]	Yes	The list of textual contexts retrieved from the knowledge base (images are handled by the ImageTextPrompt)

Outputs

Name	Type	Description
score	float	1.0 if the response is faithful to the contexts, 0.0 if not, or NaN if the LLM fails to respond

Usage Examples

Basic Usage

from ragas.metrics import MultiModalFaithfulness
from ragas.dataset_schema import SingleTurnSample

metric = MultiModalFaithfulness()
# Set up the LLM (must support multi-modal inputs)
# metric.llm = your_multimodal_llm

sample = SingleTurnSample(
    response="Apple pie is generally double-crusted.",
    retrieved_contexts=[
        "An apple pie is a fruit pie in which the principal filling ingredient is apples.",
        "It is generally double-crusted, with pastry both above and below the filling.",
    ],
)

# score = await metric.single_turn_ascore(sample)
# score will be 1.0 (faithful) or 0.0 (unfaithful)

Using the Pre-instantiated Instance

from ragas.metrics._multi_modal_faithfulness import multimodal_faithness

# multimodal_faithness is a ready-to-use MultiModalFaithfulness instance
# multimodal_faithness.llm = your_multimodal_llm

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment