Implementation:OpenGVLab InternVL LLaVA Gradio Web Server
| Knowledge Sources | |
|---|---|
| Domains | Web Serving, Multimodal Chat, Gradio UI |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This module implements a Gradio-based web server that provides an interactive multimodal chat interface for LLaVA and InternVL models.
Description
The gradio_web_server.py file builds a complete web-based chat application using the Gradio framework. It communicates with a model controller/worker infrastructure to route user requests to appropriate model backends. The server provides image upload, text input, model selection via dropdown, and parameter controls (temperature, top_p, max_new_tokens, max_input_tiles). Key features include:
- Streaming response display via HTTP streaming from worker endpoints
- Conversation template selection based on model name (LLaVA v0/v1, LLaMA-2, MPT, InternVL variants including InternLM2, Hermes-2, Phi3)
- User feedback system with upvote, downvote, and flag buttons that log to JSON files
- Content moderation via an optional moderation check
- Image processing with MD5 hashing for caching uploaded images to disk
- Model list management with support for static ("once") or dynamic ("reload") model list refreshing
The build_demo function constructs the Gradio Blocks layout with a two-column design: left column for model selector, image upload, examples, and parameters; right column for the chatbot display with action buttons. The http_bot function is the core inference handler that constructs prompts, queries worker addresses from the controller, and streams responses back to the UI.
Usage
Use this module to deploy a web-based demo for interacting with LLaVA and InternVL chat models. It is the primary user-facing interface in the LLaVA serving pipeline and requires a running controller and at least one model worker.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/serve/gradio_web_server.py
- Lines: 1-459
Signature
def build_demo(embed_mode) -> gr.Blocks
def http_bot(state, model_selector, temperature, top_p, max_new_tokens, max_input_tiles, request: gr.Request)
def add_text(state, text, image, image_process_mode, request: gr.Request)
def get_model_list() -> list
def load_demo(url_params, request: gr.Request)
Import
from llava.serve.gradio_web_server import build_demo, get_model_list
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --host | str | No | Server host address (default: "0.0.0.0") |
| --port | int | No | Server port number |
| --controller-url | str | No | URL of the model controller (default: "http://localhost:21001") |
| --concurrency-count | int | No | Number of concurrent requests (default: 10) |
| --model-list-mode | str | No | "once" or "reload" for model list fetching strategy |
| --share | bool | No | Whether to create a public Gradio share link |
| --moderate | bool | No | Enable content moderation |
| --embed | bool | No | Enable embed mode (hides title and ToS) |
Outputs
| Name | Type | Description |
|---|---|---|
| Gradio web interface | gr.Blocks | Interactive chat UI with streaming responses |
| Conversation logs | JSON files | Logged conversations with timestamps, model info, and user feedback |
Usage Examples
Basic Usage
# Launch the Gradio web server
# python -m llava.serve.gradio_web_server \
# --controller-url http://localhost:21001 \
# --port 7860 \
# --model-list-mode reload