Implementation:OpenGVLab InternVL Streamlit Chat App
| Knowledge Sources | |
|---|---|
| Domains | Web Application, Multimodal Chat, Streamlit UI, InternVL2 |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This module implements a Streamlit-based web application providing an interactive bilingual chat interface for InternVL2 multimodal models with image upload, bounding box visualization, and image generation capabilities.
Description
The app.py file is the primary user-facing demo application for InternVL2, built with Streamlit. It provides a rich interactive experience with the following features:
UI Layout:
- Bilingual support (English/Chinese) with language selector
- Sidebar containing model selector, system prompt editor, and advanced parameter controls (temperature, top_p, repetition_penalty, max_new_tokens, max_input_tiles for image resolution control)
- Multi-image upload supporting up to 4 images (PNG, JPG, JPEG, WebP)
- Gallery examples with pre-loaded images and captions for quick demonstration
- Chat message display with streaming response rendering
Core Functions:
- generate_response: Sends conversation messages with base64-encoded images to model workers via streaming HTTP, displays responses with a typing indicator
- find_bounding_boxes: Parses <ref>/<box> tags in model responses and renders colored bounding boxes with category labels using PIL ImageDraw
- query_image_generation: Detects drawing-instruction code blocks in responses and calls a Stable Diffusion worker to generate images
- load_upload_file_and_show: Handles file uploads with MD5 hashing for image caching
- save_chat_history: Logs conversations to JSON files with timestamps
Post-processing:
- LaTeX rendering support by converting \[\] and \(\) delimiters to $ signs
- Phi3-3.8B abnormal character filtering
- Alias instruction expansion for object detection shortcuts
Usage
Use this application to deploy an interactive web demo for InternVL2 models. It requires a running controller and model worker infrastructure, and optionally a Stable Diffusion worker for image generation.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: streamlit_demo/app.py
- Lines: 1-476
Signature
def get_model_list() -> list
def generate_response(messages) -> str
def find_bounding_boxes(response) -> Optional[Image]
def query_image_generation(response, sd_worker_url, timeout=15) -> Optional[Image]
def load_upload_file_and_show() -> tuple[list[Image], list[str]]
def save_chat_history() -> None
def clear_chat_history() -> None
def pil_image_to_base64(image) -> str
def show_one_or_multiple_images(message, total_image_num, is_input=True) -> None
Import
# Run as a Streamlit application:
# streamlit run streamlit_demo/app.py -- --controller_url http://localhost:10075
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --controller_url | str | No | URL of the model controller (default: "http://10.140.60.209:10075") |
| --sd_worker_url | str | No | URL of the Stable Diffusion worker for image generation (default: "http://0.0.0.0:40006") |
| --max_image_limit | int | No | Maximum number of images per conversation (default: 4) |
| User text input | str | Yes | Chat message from the user |
| Uploaded images | PIL.Image | No | Up to 4 uploaded images (PNG, JPG, JPEG, WebP) |
Outputs
| Name | Type | Description |
|---|---|---|
| Chat response | str | Streamed model response with markdown rendering |
| Bounding box images | PIL.Image | Images with drawn bounding boxes when model outputs <ref>/<box> tags |
| Generated images | PIL.Image | Stable Diffusion generated images when model outputs drawing-instruction blocks |
| Conversation logs | JSON files | Logged conversations with timestamps, model info, and message history |
Usage Examples
Basic Usage
# Launch the Streamlit demo
# streamlit run streamlit_demo/app.py -- \
# --controller_url http://localhost:10075 \
# --sd_worker_url http://localhost:40006 \
# --max_image_limit 4