Implementation:OpenGVLab InternVL Streamlit Chat App

Knowledge Sources	OpenGVLab_InternVL
Domains	Web Application, Multimodal Chat, Streamlit UI, InternVL2
Last Updated	2026-02-07 14:00 GMT

Overview

This module implements a Streamlit-based web application providing an interactive bilingual chat interface for InternVL2 multimodal models with image upload, bounding box visualization, and image generation capabilities.

Description

The app.py file is the primary user-facing demo application for InternVL2, built with Streamlit. It provides a rich interactive experience with the following features:

UI Layout:

Bilingual support (English/Chinese) with language selector
Sidebar containing model selector, system prompt editor, and advanced parameter controls (temperature, top_p, repetition_penalty, max_new_tokens, max_input_tiles for image resolution control)
Multi-image upload supporting up to 4 images (PNG, JPG, JPEG, WebP)
Gallery examples with pre-loaded images and captions for quick demonstration
Chat message display with streaming response rendering

Core Functions:

generate_response: Sends conversation messages with base64-encoded images to model workers via streaming HTTP, displays responses with a typing indicator
find_bounding_boxes: Parses <ref>/<box> tags in model responses and renders colored bounding boxes with category labels using PIL ImageDraw
query_image_generation: Detects drawing-instruction code blocks in responses and calls a Stable Diffusion worker to generate images
load_upload_file_and_show: Handles file uploads with MD5 hashing for image caching
save_chat_history: Logs conversations to JSON files with timestamps

Post-processing:

LaTeX rendering support by converting \[\] and  delimiters to $ signs
Phi3-3.8B abnormal character filtering
Alias instruction expansion for object detection shortcuts

Usage

Use this application to deploy an interactive web demo for InternVL2 models. It requires a running controller and model worker infrastructure, and optionally a Stable Diffusion worker for image generation.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: streamlit_demo/app.py
Lines: 1-476

Signature

def get_model_list() -> list
def generate_response(messages) -> str
def find_bounding_boxes(response) -> Optional[Image]
def query_image_generation(response, sd_worker_url, timeout=15) -> Optional[Image]
def load_upload_file_and_show() -> tuple[list[Image], list[str]]
def save_chat_history() -> None
def clear_chat_history() -> None
def pil_image_to_base64(image) -> str
def show_one_or_multiple_images(message, total_image_num, is_input=True) -> None

Import

# Run as a Streamlit application:
# streamlit run streamlit_demo/app.py -- --controller_url http://localhost:10075

I/O Contract

Inputs

Name	Type	Required	Description
--controller_url	str	No	URL of the model controller (default: "http://10.140.60.209:10075")
--sd_worker_url	str	No	URL of the Stable Diffusion worker for image generation (default: "http://0.0.0.0:40006")
--max_image_limit	int	No	Maximum number of images per conversation (default: 4)
User text input	str	Yes	Chat message from the user
Uploaded images	PIL.Image	No	Up to 4 uploaded images (PNG, JPG, JPEG, WebP)

Outputs

Name	Type	Description
Chat response	str	Streamed model response with markdown rendering
Bounding box images	PIL.Image	Images with drawn bounding boxes when model outputs <ref>/<box> tags
Generated images	PIL.Image	Stable Diffusion generated images when model outputs drawing-instruction blocks
Conversation logs	JSON files	Logged conversations with timestamps, model info, and message history

Usage Examples

Basic Usage

# Launch the Streamlit demo
# streamlit run streamlit_demo/app.py -- \
#     --controller_url http://localhost:10075 \
#     --sd_worker_url http://localhost:40006 \
#     --max_image_limit 4

Related Pages

Principle:OpenGVLab_InternVL_Streamlit_Chat_Interface

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment