Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lm sys FastChat Vision Image

From Leeroopedia


Knowledge Sources
Domains Vision, Image Processing, Multimodal
Last Updated 2026-02-07 06:00 GMT

Overview

Image format handling and encoding utilities for vision-enabled language models, supporting conversion between URLs, local files, PIL images, and base64-encoded byte representations.

Description

The vision/image module provides a unified Image class built on Pydantic's BaseModel for managing image data across the various formats required by multimodal language model serving. The module defines an ImageFormat enum (IntEnum) with five members: URL, LOCAL_FILEPATH, PIL_IMAGE, BYTES, and DEFAULT, which track the current representation state of an image as it moves through the processing pipeline.

The Image class holds four fields: url (the image source path or URL), filetype (the format string such as "png"), image_format (the current ImageFormat state, defaulting to BYTES), and base64_str (the base64-encoded image data). It provides several methods for format conversion. The convert_image_to_base64() method handles URL, local filepath, and byte inputs to produce base64 strings. The to_openai_image_format() method produces data URIs in the data:image/{type};base64,{data} format expected by the OpenAI API. The resize_image_and_return_image_in_bytes() method constrains images to a maximum of 1024 pixels on the longest edge while preserving aspect ratio, and further downsizes if the result exceeds a configurable maximum file size in megabytes.

The convert_url_to_image_bytes() method supports SVG files by converting them to PNG via cairosvg.svg2png() before processing, while standard raster formats are loaded directly with PIL. The to_conversation_format() method orchestrates the full pipeline: converting a local file path to resized, base64-encoded bytes and updating the Image instance's internal state. This is the primary entry point for preparing images for multimodal conversation contexts.

Usage

Use this module when integrating vision capabilities into FastChat's serving infrastructure. The Image class handles the full lifecycle of image data from initial file path or URL input through to the base64 data URI format required by model APIs. Use to_conversation_format() to prepare images for inclusion in multimodal conversation prompts, and to_openai_image_format() when formatting images specifically for OpenAI-compatible vision API calls.

Code Reference

Source Location

Signature

class ImageFormat(IntEnum):
    URL = auto()
    LOCAL_FILEPATH = auto()
    PIL_IMAGE = auto()
    BYTES = auto()
    DEFAULT = auto()

class Image(BaseModel):
    url: str = ""
    filetype: str = ""
    image_format: ImageFormat = ImageFormat.BYTES
    base64_str: str = ""

    def convert_image_to_base64(self) -> str: ...
    def to_openai_image_format(self) -> str: ...
    def resize_image_and_return_image_in_bytes(self, image, max_image_size_mb: float) -> tuple[str, BytesIO]: ...
    def convert_url_to_image_bytes(self, max_image_size_mb: float) -> tuple[str, str]: ...
    def to_conversation_format(self, max_image_size_mb: float) -> "Image": ...

Import

from fastchat.serve.vision.image import Image, ImageFormat

I/O Contract

Inputs

Name Type Required Description
url str Yes Image source: file path, URL, or empty string. Used by convert_url_to_image_bytes() and to_conversation_format()
filetype str No Image format string (e.g., "png", "jpeg"). Auto-detected during conversion
image_format ImageFormat No Current format state, defaults to ImageFormat.BYTES
base64_str str No Pre-encoded base64 image data, populated during conversion
max_image_size_mb float Yes (for resize/convert methods) Maximum allowed image size in megabytes after resizing (e.g., 3.33 for 5/1.5)

Outputs

Name Type Description
base64_str str Base64-encoded image string suitable for embedding in API payloads
data URI str Complete data URI string in data:image/{type};base64,{data} format (from to_openai_image_format())
Image (self) Image Updated Image instance with populated filetype, image_format, and base64_str (from to_conversation_format())
(image_format, image_bytes) tuple[str, BytesIO] Format string and raw bytes after resizing (from resize_image_and_return_image_in_bytes())

Usage Examples

from fastchat.serve.vision.image import Image, ImageFormat

# Load a local image and convert for conversation use
img = Image(url="fastchat/serve/example_images/fridge.jpg")
img = img.to_conversation_format(max_image_size_mb=5 / 1.5)

# Access the base64-encoded result
print(img.base64_str[:50])  # First 50 chars of base64
print(img.filetype)          # "png"
print(img.image_format)      # ImageFormat.BYTES

# Convert to OpenAI-compatible format
openai_url = img.to_openai_image_format()
# Returns: "data:image/png;base64,iVBORw0KGgo..."

# Serialize to JSON for API transmission
json_str = img.model_dump_json()

# Handle an SVG file (requires cairosvg)
svg_img = Image(url="diagram.svg")
fmt, b64 = svg_img.convert_url_to_image_bytes(max_image_size_mb=3.0)
# SVG is converted to PNG via cairosvg before encoding

# Use with a URL-based image
url_img = Image(
    url="https://example.com/photo.jpg",
    image_format=ImageFormat.URL
)
openai_format = url_img.to_openai_image_format()
# Returns the URL directly for URL-type images

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment