Implementation:Vibrantlabsai Ragas MultiModalPrompt

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Prompts, Multi-Modal, Image Processing, Security
Last Updated	2026-02-12 00:00 GMT

Overview

Provides multi-modal prompt handling for LLM evaluation, supporting text and image inputs with secure URL fetching, base64 data URI parsing, SSRF protection, and optional local file access.

Description

The multi_modal_prompt.py module extends the Ragas prompting system to support mixed text-and-image inputs. It is built around two primary classes:

ImageTextPrompt is a generic class that extends PydanticPrompt with InputModel and OutputModel type parameters. It overrides _generate_examples() to format examples with text-only context while noting that images may be available. The to_prompt_value() method converts input data into an ImageTextPromptValue by combining the instruction, output signature, examples, and the input data's to_string_list() output. The generate_multiple() method handles both LangChain LLMs (via agenerate_prompt) and Ragas LLMs (via generate), parsing outputs with RagasOutputParser and supporting retry logic.

ImageTextPromptValue is a Pydantic model extending LangChain's PromptValue. It holds a list of string items and converts them to HumanMessage objects with mixed content types. The key method _securely_process_item() determines whether each item is:

A base64 data URI -- Validated against a regex pattern (DATA_URI_REGEX) that only accepts image/png, image/jpeg, image/gif, and image/webp MIME types. The base64 data is decoded to verify format integrity.

An allowed URL (HTTP/HTTPS) -- First subjected to SSRF protection via _is_safe_url_target(), which resolves the hostname to IP addresses and checks each against disallowed categories (loopback, private, link-local, reserved). If safe, the image is downloaded with streaming, size-limited to MAX_DOWNLOAD_SIZE_BYTES (10MB), validated using Pillow's verify() method, and base64-encoded.

An allowed local file (optional, disabled by default) -- Controlled by ALLOW_LOCAL_FILE_ACCESS (default False). When enabled, files must be within ALLOWED_IMAGE_BASE_DIR, pass path traversal checks, stay under MAX_LOCAL_FILE_SIZE_BYTES, and be validated as valid images via Pillow.

Plain text -- The default fallback if none of the above conditions match.

The module defines typed dictionaries TextContent and ImageUrlContent for type-safe message content construction, along with configurable security constants for URL schemes, download limits, timeouts, and IP address checks.

Usage

Import this module when building evaluation metrics or prompts that need to handle both text and image inputs, such as evaluating multi-modal LLM applications. The ImageTextPrompt class is the main entry point for creating prompts that can process images alongside text.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/prompt/multi_modal_prompt.py

Signature

class TextContent(TypedDict):
    type: t.Literal["text"]
    text: str

class ImageUrlContent(TypedDict):
    type: t.Literal["image_url"]
    image_url: dict[str, str]

MessageContent = t.Union[TextContent, ImageUrlContent]

class ImageTextPrompt(PydanticPrompt, t.Generic[InputModel, OutputModel]):
    def _generate_examples(self) -> str: ...
    def to_prompt_value(self, data: t.Optional[InputModel] = None) -> ImageTextPromptValue: ...
    async def generate_multiple(
        self,
        llm: t.Union[BaseRagasLLM, BaseLanguageModel],
        data: InputModel,
        n: int = 1,
        temperature: t.Optional[float] = None,
        stop: t.Optional[t.List[str]] = None,
        callbacks: t.Optional[Callbacks] = None,
        retries_left: int = 3,
    ) -> t.List[OutputModel]: ...

class ImageTextPromptValue(PromptValue):
    items: t.List[str]
    def to_messages(self) -> t.List[BaseMessage]: ...
    def to_string(self) -> str: ...

Import

from ragas.prompt.multi_modal_prompt import ImageTextPrompt, ImageTextPromptValue
from ragas.prompt.multi_modal_prompt import TextContent, ImageUrlContent, MessageContent

I/O Contract

ImageTextPrompt.generate_multiple Inputs

Name	Type	Required	Description
llm	BaseRagasLLM or BaseLanguageModel	Yes	The language model to use for generation (supports both Ragas and LangChain LLMs)
data	InputModel	Yes	The input data for generation; must implement to_string_list() returning a list of text/image items
n	int	No (default 1)	The number of outputs to generate
temperature	float or None	No	Temperature parameter for controlling randomness
stop	List[str] or None	No	Stop sequences to end generation
callbacks	Callbacks or None	No	Callback functions called during generation
retries_left	int	No (default 3)	Number of retries for output parsing failures

ImageTextPrompt.generate_multiple Outputs

Name	Type	Description
return	List[OutputModel]	A list of generated and parsed output model instances

ImageTextPromptValue.to_messages Outputs

Name	Type	Description
return	List[BaseMessage]	A list containing a single HumanMessage with mixed text and image content parts

Security Configuration

The module defines several security constants that control image processing behavior:

Constant	Default	Description
ALLOWED_URL_SCHEMES	{"http", "https"}	URL schemes allowed for image fetching
MAX_DOWNLOAD_SIZE_BYTES	10 * 1024 * 1024 (10MB)	Maximum size for downloaded images
REQUESTS_TIMEOUT_SECONDS	10	HTTP request timeout in seconds
ALLOW_LOCAL_FILE_ACCESS	False	Whether local file paths are processed as images
ALLOWED_IMAGE_BASE_DIR	None	Absolute path to the only allowed directory for local image loading
MAX_LOCAL_FILE_SIZE_BYTES	10 * 1024 * 1024 (10MB)	Maximum size for local image files
ALLOW_INTERNAL_TARGETS	False	Whether to bypass SSRF IP address checks (dangerous)
DISALLOWED_IP_CHECKS	{"is_loopback", "is_private", "is_link_local", "is_reserved"}	IP address categories blocked by SSRF protection

Usage Examples

Subclassing ImageTextPrompt

from pydantic import BaseModel
from ragas.prompt.multi_modal_prompt import ImageTextPrompt

class MyInput(BaseModel):
    text: str
    image_url: str

    def to_string_list(self):
        return [self.text, self.image_url]

class MyOutput(BaseModel):
    description: str

class MyMultiModalPrompt(ImageTextPrompt[MyInput, MyOutput]):
    name = "my_multi_modal_prompt"
    instruction = "Describe the image in context of the text."
    input_model = MyInput
    output_model = MyOutput

Generating with an LLM

prompt = MyMultiModalPrompt()
data = MyInput(text="What is shown?", image_url="https://example.com/photo.jpg")

results = await prompt.generate_multiple(
    llm=my_ragas_llm,
    data=data,
    n=1,
    temperature=0.0,
)
print(results[0].description)

Using ImageTextPromptValue Directly

from ragas.prompt.multi_modal_prompt import ImageTextPromptValue

# Items can be plain text, base64 data URIs, or HTTP/HTTPS image URLs
prompt_value = ImageTextPromptValue(items=[
    "Describe this image:",
    "data:image/png;base64,iVBORw0KGgoAAAANSUhEU...",
    "https://example.com/another_image.jpg",
])

messages = prompt_value.to_messages()
# Returns [HumanMessage(content=[TextContent, ImageUrlContent, ImageUrlContent])]

Related Pages

ragas.prompt.pydantic_prompt.PydanticPrompt -- Base prompt class that ImageTextPrompt extends
ragas.llms.base.BaseRagasLLM -- Ragas LLM interface used for async generation
langchain_core.messages.HumanMessage -- LangChain message type used for multi-modal content
Pillow -- Used for image content validation via Image.verify()
SSRF_protection -- IP address resolution and validation to prevent server-side request forgery

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment