Implementation:Lm sys FastChat Vision Image
| Knowledge Sources | |
|---|---|
| Domains | Vision, Image Processing, Multimodal |
| Last Updated | 2026-02-07 06:00 GMT |
Overview
Image format handling and encoding utilities for vision-enabled language models, supporting conversion between URLs, local files, PIL images, and base64-encoded byte representations.
Description
The vision/image module provides a unified Image class built on Pydantic's BaseModel for managing image data across the various formats required by multimodal language model serving. The module defines an ImageFormat enum (IntEnum) with five members: URL, LOCAL_FILEPATH, PIL_IMAGE, BYTES, and DEFAULT, which track the current representation state of an image as it moves through the processing pipeline.
The Image class holds four fields: url (the image source path or URL), filetype (the format string such as "png"), image_format (the current ImageFormat state, defaulting to BYTES), and base64_str (the base64-encoded image data). It provides several methods for format conversion. The convert_image_to_base64() method handles URL, local filepath, and byte inputs to produce base64 strings. The to_openai_image_format() method produces data URIs in the data:image/{type};base64,{data} format expected by the OpenAI API. The resize_image_and_return_image_in_bytes() method constrains images to a maximum of 1024 pixels on the longest edge while preserving aspect ratio, and further downsizes if the result exceeds a configurable maximum file size in megabytes.
The convert_url_to_image_bytes() method supports SVG files by converting them to PNG via cairosvg.svg2png() before processing, while standard raster formats are loaded directly with PIL. The to_conversation_format() method orchestrates the full pipeline: converting a local file path to resized, base64-encoded bytes and updating the Image instance's internal state. This is the primary entry point for preparing images for multimodal conversation contexts.
Usage
Use this module when integrating vision capabilities into FastChat's serving infrastructure. The Image class handles the full lifecycle of image data from initial file path or URL input through to the base64 data URI format required by model APIs. Use to_conversation_format() to prepare images for inclusion in multimodal conversation prompts, and to_openai_image_format() when formatting images specifically for OpenAI-compatible vision API calls.
Code Reference
Source Location
- Repository: Lm_sys_FastChat
- File: fastchat/serve/vision/image.py
- Lines: 1-135
Signature
class ImageFormat(IntEnum):
URL = auto()
LOCAL_FILEPATH = auto()
PIL_IMAGE = auto()
BYTES = auto()
DEFAULT = auto()
class Image(BaseModel):
url: str = ""
filetype: str = ""
image_format: ImageFormat = ImageFormat.BYTES
base64_str: str = ""
def convert_image_to_base64(self) -> str: ...
def to_openai_image_format(self) -> str: ...
def resize_image_and_return_image_in_bytes(self, image, max_image_size_mb: float) -> tuple[str, BytesIO]: ...
def convert_url_to_image_bytes(self, max_image_size_mb: float) -> tuple[str, str]: ...
def to_conversation_format(self, max_image_size_mb: float) -> "Image": ...
Import
from fastchat.serve.vision.image import Image, ImageFormat
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| url | str | Yes | Image source: file path, URL, or empty string. Used by convert_url_to_image_bytes() and to_conversation_format() |
| filetype | str | No | Image format string (e.g., "png", "jpeg"). Auto-detected during conversion |
| image_format | ImageFormat | No | Current format state, defaults to ImageFormat.BYTES |
| base64_str | str | No | Pre-encoded base64 image data, populated during conversion |
| max_image_size_mb | float | Yes (for resize/convert methods) | Maximum allowed image size in megabytes after resizing (e.g., 3.33 for 5/1.5) |
Outputs
| Name | Type | Description |
|---|---|---|
| base64_str | str | Base64-encoded image string suitable for embedding in API payloads |
| data URI | str | Complete data URI string in data:image/{type};base64,{data} format (from to_openai_image_format()) |
| Image (self) | Image | Updated Image instance with populated filetype, image_format, and base64_str (from to_conversation_format()) |
| (image_format, image_bytes) | tuple[str, BytesIO] | Format string and raw bytes after resizing (from resize_image_and_return_image_in_bytes()) |
Usage Examples
from fastchat.serve.vision.image import Image, ImageFormat
# Load a local image and convert for conversation use
img = Image(url="fastchat/serve/example_images/fridge.jpg")
img = img.to_conversation_format(max_image_size_mb=5 / 1.5)
# Access the base64-encoded result
print(img.base64_str[:50]) # First 50 chars of base64
print(img.filetype) # "png"
print(img.image_format) # ImageFormat.BYTES
# Convert to OpenAI-compatible format
openai_url = img.to_openai_image_format()
# Returns: "data:image/png;base64,iVBORw0KGgo..."
# Serialize to JSON for API transmission
json_str = img.model_dump_json()
# Handle an SVG file (requires cairosvg)
svg_img = Image(url="diagram.svg")
fmt, b64 = svg_img.convert_url_to_image_bytes(max_image_size_mb=3.0)
# SVG is converted to PNG via cairosvg before encoding
# Use with a URL-based image
url_img = Image(
url="https://example.com/photo.jpg",
image_format=ImageFormat.URL
)
openai_format = url_img.to_openai_image_format()
# Returns the URL directly for URL-type images
Related Pages
- Principle:Lm_sys_FastChat_Vision_Image_Processing
- Implements: Principle:Lm_sys_FastChat_Vision_Image_Processing
- Lm_sys_FastChat_Huggingface_API_Inference - Text inference pipeline in the same serve module
- Lm_sys_FastChat_Remote_Logger - Logging infrastructure used alongside vision serving