Implementation:Lm sys FastChat Build Single Model Vision UI

Knowledge Sources	Lm_sys_FastChat Chatbot Arena
Domains	Web_UI, Model_Evaluation
Last Updated	2026-02-07 06:00 GMT

Overview

Constructs a single-model vision chat UI with multimodal image upload support for evaluating vision-language models.

Description

The build_single_vision_language_model_ui function creates a Gradio UI component that enables users to interact with a single vision-language model. Users can upload images via a MultimodalTextbox input and ask questions about them. The UI includes an image display panel that dynamically shows or hides based on whether an image has been uploaded, controlled by set_visible_image and set_invisible_image.

The module provides robust image handling through several utility functions. add_image extracts image files from the multimodal textbox input. convert_images_to_conversation_format transforms uploaded images into the internal Image class format expected by the conversation system, supporting both file paths and base64-encoded data via the ImageFormat and Image classes from fastchat.serve.vision.image. The _prepare_text_with_image function assembles the final prompt by combining text and image references into the conversation state, handling CSAM (child safety) flagging.

The moderate_input function performs dual-layer content moderation: text moderation via moderation_filter and image moderation via image_moderation_filter, returning appropriate warning messages (TEXT_MODERATION_MSG or IMAGE_MODERATION_MSG). The add_text function orchestrates the full input pipeline -- validating character limits, running moderation, converting images, and appending the user turn to conversation state. Optional VQA sample loading is supported through get_vqa_sample which selects random visual question-answering examples for demonstration.

Usage

Use this module when building a single-model vision chat tab in the Chatbot Arena interface. It serves as the foundation for the multimodal direct chat experience and also exports key utility functions (set_visible_image, set_invisible_image, add_image, moderate_input, _prepare_text_with_image, convert_images_to_conversation_format) that are reused by the vision arena modules (both anonymous and named).

Code Reference

Source Location

Repository: Lm_sys_FastChat
File: fastchat/serve/gradio_block_arena_vision.py
Lines: 1-511

Signature

def build_single_vision_language_model_ui(
    context: Context, add_promotion_links=False, random_questions=None
):
    """
    Build a single vision-language model chat UI.

    Args:
        context: Global Context object containing model lists and configuration.
        add_promotion_links: Whether to display blog/paper/social promotion links.
        random_questions: Optional list of VQA sample questions for the random button.

    Returns:
        list: [state, model_selector] Gradio State and Dropdown components.
    """

Import

from fastchat.serve.gradio_block_arena_vision import build_single_vision_language_model_ui

Key Functions

Function	Line	Description
build_single_vision_language_model_ui	298	Main entry point; constructs the single-model vision chat Gradio tab
get_vqa_sample	70	Selects a random VQA sample with question text and image path
set_visible_image	77	Shows the image display column when an image is uploaded
set_invisible_image	89	Hides the image display column
add_image	93	Extracts image files from multimodal textbox input
vote_last_response	101	Logs user vote (upvote/downvote/flag) for single-model evaluation
upvote_last_response	115	Records an upvote for the model response
downvote_last_response	122	Records a downvote for the model response
flag_last_response	129	Flags the model response for review
regenerate	136	Clears last assistant message and regenerates a new response
clear_history	146	Resets conversation state, chatbot display, and image panel
_prepare_text_with_image	169	Assembles prompt by combining text and image references into state
convert_images_to_conversation_format	181	Converts uploaded images to internal Image class format
moderate_input	194	Performs text and image content moderation checks
add_text	219	Full input pipeline: validation, moderation, image conversion, state update
report_csam_image	165	Reports detected CSAM content for safety compliance

I/O Contract

Inputs

Name	Type	Required	Description
context	Context	Yes	Global state object from fastchat.serve.gradio_global_state containing model lists and configuration
add_promotion_links	bool	No	Whether to display promotional links in the notice markdown (default: False)
random_questions	list	No	Optional list of VQA sample dicts with "question" and "path" keys for the random example button

Outputs

Name	Type	Description
returns	list	List of [state, model_selector] containing a single Gradio State and a Dropdown component

Dependencies

Internal Imports

from fastchat.constants import (
    TEXT_MODERATION_MSG, IMAGE_MODERATION_MSG, MODERATION_MSG,
    CONVERSATION_LIMIT_MSG, INPUT_CHAR_LEN_LIMIT,
    CONVERSATION_TURN_LIMIT, SURVEY_LINK,
)
from fastchat.model.model_adapter import get_conversation_template
from fastchat.serve.gradio_global_state import Context
from fastchat.serve.gradio_web_server import (
    get_model_description_md, acknowledgment_md, bot_response,
    get_ip, disable_btn, State, get_conv_log_filename, get_remote_logger,
)
from fastchat.serve.vision.image import ImageFormat, Image
from fastchat.utils import build_logger, moderation_filter, image_moderation_filter

External Imports

import json
import os
import time
from typing import List, Union
import gradio as gr
from gradio.data_classes import FileData
import numpy as np

Usage Examples

# Building the single vision-language model tab
import gradio as gr
from fastchat.serve.gradio_global_state import Context
from fastchat.serve.gradio_block_arena_vision import (
    build_single_vision_language_model_ui,
)

context = Context()
context.text_models = ["llava-v1.5-7b", "llava-v1.5-13b"]

vqa_samples = [
    {"question": "What is in this image?", "path": "/data/samples/cat.jpg"},
    {"question": "Describe the scene.", "path": "/data/samples/street.jpg"},
]

with gr.Blocks() as demo:
    with gr.Tab("Vision Direct Chat"):
        state_and_selector = build_single_vision_language_model_ui(
            context,
            add_promotion_links=True,
            random_questions=vqa_samples,
        )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment