Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL LLaVA Gradio Web Server

From Leeroopedia


Knowledge Sources
Domains Web Serving, Multimodal Chat, Gradio UI
Last Updated 2026-02-07 14:00 GMT

Overview

This module implements a Gradio-based web server that provides an interactive multimodal chat interface for LLaVA and InternVL models.

Description

The gradio_web_server.py file builds a complete web-based chat application using the Gradio framework. It communicates with a model controller/worker infrastructure to route user requests to appropriate model backends. The server provides image upload, text input, model selection via dropdown, and parameter controls (temperature, top_p, max_new_tokens, max_input_tiles). Key features include:

  • Streaming response display via HTTP streaming from worker endpoints
  • Conversation template selection based on model name (LLaVA v0/v1, LLaMA-2, MPT, InternVL variants including InternLM2, Hermes-2, Phi3)
  • User feedback system with upvote, downvote, and flag buttons that log to JSON files
  • Content moderation via an optional moderation check
  • Image processing with MD5 hashing for caching uploaded images to disk
  • Model list management with support for static ("once") or dynamic ("reload") model list refreshing

The build_demo function constructs the Gradio Blocks layout with a two-column design: left column for model selector, image upload, examples, and parameters; right column for the chatbot display with action buttons. The http_bot function is the core inference handler that constructs prompts, queries worker addresses from the controller, and streams responses back to the UI.

Usage

Use this module to deploy a web-based demo for interacting with LLaVA and InternVL chat models. It is the primary user-facing interface in the LLaVA serving pipeline and requires a running controller and at least one model worker.

Code Reference

Source Location

Signature

def build_demo(embed_mode) -> gr.Blocks

def http_bot(state, model_selector, temperature, top_p, max_new_tokens, max_input_tiles, request: gr.Request)

def add_text(state, text, image, image_process_mode, request: gr.Request)

def get_model_list() -> list

def load_demo(url_params, request: gr.Request)

Import

from llava.serve.gradio_web_server import build_demo, get_model_list

I/O Contract

Inputs

Name Type Required Description
--host str No Server host address (default: "0.0.0.0")
--port int No Server port number
--controller-url str No URL of the model controller (default: "http://localhost:21001")
--concurrency-count int No Number of concurrent requests (default: 10)
--model-list-mode str No "once" or "reload" for model list fetching strategy
--share bool No Whether to create a public Gradio share link
--moderate bool No Enable content moderation
--embed bool No Enable embed mode (hides title and ToS)

Outputs

Name Type Description
Gradio web interface gr.Blocks Interactive chat UI with streaming responses
Conversation logs JSON files Logged conversations with timestamps, model info, and user feedback

Usage Examples

Basic Usage

# Launch the Gradio web server
# python -m llava.serve.gradio_web_server \
#     --controller-url http://localhost:21001 \
#     --port 7860 \
#     --model-list-mode reload

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment