Implementation:Vibrantlabsai Ragas MultiModalPrompt
| Knowledge Sources | |
|---|---|
| Domains | Prompts, Multi-Modal, Image Processing, Security |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Provides multi-modal prompt handling for LLM evaluation, supporting text and image inputs with secure URL fetching, base64 data URI parsing, SSRF protection, and optional local file access.
Description
The multi_modal_prompt.py module extends the Ragas prompting system to support mixed text-and-image inputs. It is built around two primary classes:
ImageTextPrompt is a generic class that extends PydanticPrompt with InputModel and OutputModel type parameters. It overrides _generate_examples() to format examples with text-only context while noting that images may be available. The to_prompt_value() method converts input data into an ImageTextPromptValue by combining the instruction, output signature, examples, and the input data's to_string_list() output. The generate_multiple() method handles both LangChain LLMs (via agenerate_prompt) and Ragas LLMs (via generate), parsing outputs with RagasOutputParser and supporting retry logic.
ImageTextPromptValue is a Pydantic model extending LangChain's PromptValue. It holds a list of string items and converts them to HumanMessage objects with mixed content types. The key method _securely_process_item() determines whether each item is:
- A base64 data URI -- Validated against a regex pattern (DATA_URI_REGEX) that only accepts image/png, image/jpeg, image/gif, and image/webp MIME types. The base64 data is decoded to verify format integrity.
- An allowed URL (HTTP/HTTPS) -- First subjected to SSRF protection via _is_safe_url_target(), which resolves the hostname to IP addresses and checks each against disallowed categories (loopback, private, link-local, reserved). If safe, the image is downloaded with streaming, size-limited to MAX_DOWNLOAD_SIZE_BYTES (10MB), validated using Pillow's verify() method, and base64-encoded.
- An allowed local file (optional, disabled by default) -- Controlled by ALLOW_LOCAL_FILE_ACCESS (default False). When enabled, files must be within ALLOWED_IMAGE_BASE_DIR, pass path traversal checks, stay under MAX_LOCAL_FILE_SIZE_BYTES, and be validated as valid images via Pillow.
- Plain text -- The default fallback if none of the above conditions match.
The module defines typed dictionaries TextContent and ImageUrlContent for type-safe message content construction, along with configurable security constants for URL schemes, download limits, timeouts, and IP address checks.
Usage
Import this module when building evaluation metrics or prompts that need to handle both text and image inputs, such as evaluating multi-modal LLM applications. The ImageTextPrompt class is the main entry point for creating prompts that can process images alongside text.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/prompt/multi_modal_prompt.py
Signature
class TextContent(TypedDict):
type: t.Literal["text"]
text: str
class ImageUrlContent(TypedDict):
type: t.Literal["image_url"]
image_url: dict[str, str]
MessageContent = t.Union[TextContent, ImageUrlContent]
class ImageTextPrompt(PydanticPrompt, t.Generic[InputModel, OutputModel]):
def _generate_examples(self) -> str: ...
def to_prompt_value(self, data: t.Optional[InputModel] = None) -> ImageTextPromptValue: ...
async def generate_multiple(
self,
llm: t.Union[BaseRagasLLM, BaseLanguageModel],
data: InputModel,
n: int = 1,
temperature: t.Optional[float] = None,
stop: t.Optional[t.List[str]] = None,
callbacks: t.Optional[Callbacks] = None,
retries_left: int = 3,
) -> t.List[OutputModel]: ...
class ImageTextPromptValue(PromptValue):
items: t.List[str]
def to_messages(self) -> t.List[BaseMessage]: ...
def to_string(self) -> str: ...
Import
from ragas.prompt.multi_modal_prompt import ImageTextPrompt, ImageTextPromptValue
from ragas.prompt.multi_modal_prompt import TextContent, ImageUrlContent, MessageContent
I/O Contract
ImageTextPrompt.generate_multiple Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| llm | BaseRagasLLM or BaseLanguageModel | Yes | The language model to use for generation (supports both Ragas and LangChain LLMs) |
| data | InputModel | Yes | The input data for generation; must implement to_string_list() returning a list of text/image items |
| n | int | No (default 1) | The number of outputs to generate |
| temperature | float or None | No | Temperature parameter for controlling randomness |
| stop | List[str] or None | No | Stop sequences to end generation |
| callbacks | Callbacks or None | No | Callback functions called during generation |
| retries_left | int | No (default 3) | Number of retries for output parsing failures |
ImageTextPrompt.generate_multiple Outputs
| Name | Type | Description |
|---|---|---|
| return | List[OutputModel] | A list of generated and parsed output model instances |
ImageTextPromptValue.to_messages Outputs
| Name | Type | Description |
|---|---|---|
| return | List[BaseMessage] | A list containing a single HumanMessage with mixed text and image content parts |
Security Configuration
The module defines several security constants that control image processing behavior:
| Constant | Default | Description |
|---|---|---|
| ALLOWED_URL_SCHEMES | {"http", "https"} | URL schemes allowed for image fetching |
| MAX_DOWNLOAD_SIZE_BYTES | 10 * 1024 * 1024 (10MB) | Maximum size for downloaded images |
| REQUESTS_TIMEOUT_SECONDS | 10 | HTTP request timeout in seconds |
| ALLOW_LOCAL_FILE_ACCESS | False | Whether local file paths are processed as images |
| ALLOWED_IMAGE_BASE_DIR | None | Absolute path to the only allowed directory for local image loading |
| MAX_LOCAL_FILE_SIZE_BYTES | 10 * 1024 * 1024 (10MB) | Maximum size for local image files |
| ALLOW_INTERNAL_TARGETS | False | Whether to bypass SSRF IP address checks (dangerous) |
| DISALLOWED_IP_CHECKS | {"is_loopback", "is_private", "is_link_local", "is_reserved"} | IP address categories blocked by SSRF protection |
Usage Examples
Subclassing ImageTextPrompt
from pydantic import BaseModel
from ragas.prompt.multi_modal_prompt import ImageTextPrompt
class MyInput(BaseModel):
text: str
image_url: str
def to_string_list(self):
return [self.text, self.image_url]
class MyOutput(BaseModel):
description: str
class MyMultiModalPrompt(ImageTextPrompt[MyInput, MyOutput]):
name = "my_multi_modal_prompt"
instruction = "Describe the image in context of the text."
input_model = MyInput
output_model = MyOutput
Generating with an LLM
prompt = MyMultiModalPrompt()
data = MyInput(text="What is shown?", image_url="https://example.com/photo.jpg")
results = await prompt.generate_multiple(
llm=my_ragas_llm,
data=data,
n=1,
temperature=0.0,
)
print(results[0].description)
Using ImageTextPromptValue Directly
from ragas.prompt.multi_modal_prompt import ImageTextPromptValue
# Items can be plain text, base64 data URIs, or HTTP/HTTPS image URLs
prompt_value = ImageTextPromptValue(items=[
"Describe this image:",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEU...",
"https://example.com/another_image.jpg",
])
messages = prompt_value.to_messages()
# Returns [HumanMessage(content=[TextContent, ImageUrlContent, ImageUrlContent])]
Related Pages
- ragas.prompt.pydantic_prompt.PydanticPrompt -- Base prompt class that ImageTextPrompt extends
- ragas.llms.base.BaseRagasLLM -- Ragas LLM interface used for async generation
- langchain_core.messages.HumanMessage -- LangChain message type used for multi-modal content
- Pillow -- Used for image content validation via Image.verify()
- SSRF_protection -- IP address resolution and validation to prevent server-side request forgery