Implementation:Vibrantlabsai Ragas MultiModalFaithfulness
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
MultiModalFaithfulness is a metric that evaluates whether a response is faithfully supported by both the visual (image) and textual context information provided.
Description
This metric leverages an LLM with multi-modal capabilities to determine whether a given response is supported by the retrieved contexts, which may include both images and text. The metric uses an ImageTextPrompt to send the response along with the retrieved contexts to a vision-language model, which returns a boolean verdict indicating whether the information is faithful.
The algorithm works as follows:
- The response text and retrieved contexts are packaged into a FaithfulnessInput Pydantic model.
- The input is sent to the configured LLM via a MultiModalFaithfulnessPrompt, which includes few-shot examples demonstrating faithful and unfaithful responses about apple pie.
- The LLM returns a FaithfulnessOutput containing a boolean
faithfulfield. - The boolean is cast to a float: 1.0 for faithful, 0.0 for unfaithful. If the LLM returns no response, the score is NaN.
The prompt instructs the model: "Please tell if a given piece of information is supported by the visual as well as textual context information. You need to answer with either True or False. Answer True if any of the image(s) and textual context supports the information."
Usage
Use this metric when evaluating RAG systems that retrieve multi-modal content (images and text). It is particularly useful for applications that combine visual and textual evidence, such as document understanding, visual question answering, or multi-modal knowledge bases. A pre-instantiated convenience instance is available as multimodal_faithness.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_multi_modal_faithfulness.py
Signature
@dataclass
class MultiModalFaithfulness(MetricWithLLM, SingleTurnMetric):
name: str = "faithful_rate"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {
MetricType.SINGLE_TURN: {
"response",
"retrieved_contexts",
}
}
)
output_type: t.Optional[MetricOutputType] = MetricOutputType.CONTINUOUS
faithfulness_prompt: ImageTextPrompt = MultiModalFaithfulnessPrompt()
Import
from ragas.metrics import MultiModalFaithfulness
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| response | str | Yes | The AI-generated response to evaluate for faithfulness |
| retrieved_contexts | list[str] | Yes | The list of textual contexts retrieved from the knowledge base (images are handled by the ImageTextPrompt) |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | 1.0 if the response is faithful to the contexts, 0.0 if not, or NaN if the LLM fails to respond |
Usage Examples
Basic Usage
from ragas.metrics import MultiModalFaithfulness
from ragas.dataset_schema import SingleTurnSample
metric = MultiModalFaithfulness()
# Set up the LLM (must support multi-modal inputs)
# metric.llm = your_multimodal_llm
sample = SingleTurnSample(
response="Apple pie is generally double-crusted.",
retrieved_contexts=[
"An apple pie is a fruit pie in which the principal filling ingredient is apples.",
"It is generally double-crusted, with pastry both above and below the filling.",
],
)
# score = await metric.single_turn_ascore(sample)
# score will be 1.0 (faithful) or 0.0 (unfaithful)
Using the Pre-instantiated Instance
from ragas.metrics._multi_modal_faithfulness import multimodal_faithness
# multimodal_faithness is a ready-to-use MultiModalFaithfulness instance
# multimodal_faithness.llm = your_multimodal_llm