Implementation:Lm sys FastChat Vision Image

Knowledge Sources	Lm_sys_FastChat
Domains	Vision, Image Processing, Multimodal
Last Updated	2026-02-07 06:00 GMT

Overview

Image format handling and encoding utilities for vision-enabled language models, supporting conversion between URLs, local files, PIL images, and base64-encoded byte representations.

Description

The vision/image module provides a unified Image class built on Pydantic's BaseModel for managing image data across the various formats required by multimodal language model serving. The module defines an ImageFormat enum (IntEnum) with five members: URL, LOCAL_FILEPATH, PIL_IMAGE, BYTES, and DEFAULT, which track the current representation state of an image as it moves through the processing pipeline.

The Image class holds four fields: url (the image source path or URL), filetype (the format string such as "png"), image_format (the current ImageFormat state, defaulting to BYTES), and base64_str (the base64-encoded image data). It provides several methods for format conversion. The convert_image_to_base64() method handles URL, local filepath, and byte inputs to produce base64 strings. The to_openai_image_format() method produces data URIs in the data:image/{type};base64,{data} format expected by the OpenAI API. The resize_image_and_return_image_in_bytes() method constrains images to a maximum of 1024 pixels on the longest edge while preserving aspect ratio, and further downsizes if the result exceeds a configurable maximum file size in megabytes.

The convert_url_to_image_bytes() method supports SVG files by converting them to PNG via cairosvg.svg2png() before processing, while standard raster formats are loaded directly with PIL. The to_conversation_format() method orchestrates the full pipeline: converting a local file path to resized, base64-encoded bytes and updating the Image instance's internal state. This is the primary entry point for preparing images for multimodal conversation contexts.

Usage

Use this module when integrating vision capabilities into FastChat's serving infrastructure. The Image class handles the full lifecycle of image data from initial file path or URL input through to the base64 data URI format required by model APIs. Use to_conversation_format() to prepare images for inclusion in multimodal conversation prompts, and to_openai_image_format() when formatting images specifically for OpenAI-compatible vision API calls.

Code Reference

Source Location

Repository: Lm_sys_FastChat
File: fastchat/serve/vision/image.py
Lines: 1-135

Signature

class ImageFormat(IntEnum):
    URL = auto()
    LOCAL_FILEPATH = auto()
    PIL_IMAGE = auto()
    BYTES = auto()
    DEFAULT = auto()

class Image(BaseModel):
    url: str = ""
    filetype: str = ""
    image_format: ImageFormat = ImageFormat.BYTES
    base64_str: str = ""

    def convert_image_to_base64(self) -> str: ...
    def to_openai_image_format(self) -> str: ...
    def resize_image_and_return_image_in_bytes(self, image, max_image_size_mb: float) -> tuple[str, BytesIO]: ...
    def convert_url_to_image_bytes(self, max_image_size_mb: float) -> tuple[str, str]: ...
    def to_conversation_format(self, max_image_size_mb: float) -> "Image": ...

Import

from fastchat.serve.vision.image import Image, ImageFormat

I/O Contract

Inputs

Name	Type	Required	Description
url	str	Yes	Image source: file path, URL, or empty string. Used by convert_url_to_image_bytes() and to_conversation_format()
filetype	str	No	Image format string (e.g., "png", "jpeg"). Auto-detected during conversion
image_format	ImageFormat	No	Current format state, defaults to ImageFormat.BYTES
base64_str	str	No	Pre-encoded base64 image data, populated during conversion
max_image_size_mb	float	Yes (for resize/convert methods)	Maximum allowed image size in megabytes after resizing (e.g., 3.33 for 5/1.5)

Outputs

Name	Type	Description
base64_str	str	Base64-encoded image string suitable for embedding in API payloads
data URI	str	Complete data URI string in data:image/{type};base64,{data} format (from to_openai_image_format())
Image (self)	Image	Updated Image instance with populated filetype, image_format, and base64_str (from to_conversation_format())
(image_format, image_bytes)	tuple[str, BytesIO]	Format string and raw bytes after resizing (from resize_image_and_return_image_in_bytes())

Usage Examples

from fastchat.serve.vision.image import Image, ImageFormat

# Load a local image and convert for conversation use
img = Image(url="fastchat/serve/example_images/fridge.jpg")
img = img.to_conversation_format(max_image_size_mb=5 / 1.5)

# Access the base64-encoded result
print(img.base64_str[:50])  # First 50 chars of base64
print(img.filetype)          # "png"
print(img.image_format)      # ImageFormat.BYTES

# Convert to OpenAI-compatible format
openai_url = img.to_openai_image_format()
# Returns: "data:image/png;base64,iVBORw0KGgo..."

# Serialize to JSON for API transmission
json_str = img.model_dump_json()

# Handle an SVG file (requires cairosvg)
svg_img = Image(url="diagram.svg")
fmt, b64 = svg_img.convert_url_to_image_bytes(max_image_size_mb=3.0)
# SVG is converted to PNG via cairosvg before encoding

# Use with a URL-based image
url_img = Image(
    url="https://example.com/photo.jpg",
    image_format=ImageFormat.URL
)
openai_format = url_img.to_openai_image_format()
# Returns the URL directly for URL-type images

Related Pages

Principle:Lm_sys_FastChat_Vision_Image_Processing
Implements: Principle:Lm_sys_FastChat_Vision_Image_Processing
Lm_sys_FastChat_Huggingface_API_Inference - Text inference pipeline in the same serve module
Lm_sys_FastChat_Remote_Logger - Logging infrastructure used alongside vision serving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment