Implementation:OpenGVLab InternVL LLaVA Gradio Web Server

Knowledge Sources	OpenGVLab_InternVL
Domains	Web Serving, Multimodal Chat, Gradio UI
Last Updated	2026-02-07 14:00 GMT

Overview

This module implements a Gradio-based web server that provides an interactive multimodal chat interface for LLaVA and InternVL models.

Description

The gradio_web_server.py file builds a complete web-based chat application using the Gradio framework. It communicates with a model controller/worker infrastructure to route user requests to appropriate model backends. The server provides image upload, text input, model selection via dropdown, and parameter controls (temperature, top_p, max_new_tokens, max_input_tiles). Key features include:

Streaming response display via HTTP streaming from worker endpoints
Conversation template selection based on model name (LLaVA v0/v1, LLaMA-2, MPT, InternVL variants including InternLM2, Hermes-2, Phi3)
User feedback system with upvote, downvote, and flag buttons that log to JSON files
Content moderation via an optional moderation check
Image processing with MD5 hashing for caching uploaded images to disk
Model list management with support for static ("once") or dynamic ("reload") model list refreshing

The build_demo function constructs the Gradio Blocks layout with a two-column design: left column for model selector, image upload, examples, and parameters; right column for the chatbot display with action buttons. The http_bot function is the core inference handler that constructs prompts, queries worker addresses from the controller, and streams responses back to the UI.

Usage

Use this module to deploy a web-based demo for interacting with LLaVA and InternVL chat models. It is the primary user-facing interface in the LLaVA serving pipeline and requires a running controller and at least one model worker.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: internvl_chat_llava/llava/serve/gradio_web_server.py
Lines: 1-459

Signature

def build_demo(embed_mode) -> gr.Blocks

def http_bot(state, model_selector, temperature, top_p, max_new_tokens, max_input_tiles, request: gr.Request)

def add_text(state, text, image, image_process_mode, request: gr.Request)

def get_model_list() -> list

def load_demo(url_params, request: gr.Request)

Import

from llava.serve.gradio_web_server import build_demo, get_model_list

I/O Contract

Inputs

Name	Type	Required	Description
--host	str	No	Server host address (default: "0.0.0.0")
--port	int	No	Server port number
--controller-url	str	No	URL of the model controller (default: "http://localhost:21001")
--concurrency-count	int	No	Number of concurrent requests (default: 10)
--model-list-mode	str	No	"once" or "reload" for model list fetching strategy
--share	bool	No	Whether to create a public Gradio share link
--moderate	bool	No	Enable content moderation
--embed	bool	No	Enable embed mode (hides title and ToS)

Outputs

Name	Type	Description
Gradio web interface	gr.Blocks	Interactive chat UI with streaming responses
Conversation logs	JSON files	Logged conversations with timestamps, model info, and user feedback

Usage Examples

Basic Usage

# Launch the Gradio web server
# python -m llava.serve.gradio_web_server \
#     --controller-url http://localhost:21001 \
#     --port 7860 \
#     --model-list-mode reload

Related Pages

Principle:OpenGVLab_InternVL_Gradio_Chat_Serving

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment