Implementation:Mit han lab Llm awq Gradio Web Server
| Knowledge Sources | |
|---|---|
| Domains | Serving, UI |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The Gradio Web Server provides a browser-based chat interface for TinyChat, enabling interactive multimodal conversations with LLaVA/VILA models served through the distributed worker infrastructure.
Description
This module implements a full-featured Gradio web UI for the TinyChat serving stack. The build_demo(embed_mode) function constructs the Gradio Blocks application, which includes up to IMAGE_BOX_NUM (3) image upload slots, a video upload slot (extracting up to 8 frames), a text input box, model selection dropdown, prompt style radio buttons ("default" / "no-sys"), and tunable generation parameters (temperature, top_p, max output tokens). The UI supports BUTTON_LIST_LEN (2) action buttons for regeneration and clearing history. The get_model_list() function queries the controller to discover available models. The load_demo() and load_demo_refresh_model_list() functions initialize conversation state and model selection on page load. The http_bot() generator function handles the core chat loop: it resolves the conversation template based on the model name, queries the controller for a worker address, constructs the prompt with image token placeholders, streams the response from the worker, and logs the conversation to a timestamped JSON file via get_conv_log_filename(). Helper functions add_images(), add_text(), add_text_only(), regenerate(), clear_history(), and change_prompt_style() manage conversation state transitions. The server supports content moderation and automatic image token padding via command-line flags.
Usage
Run this module as a standalone Gradio application that connects to a running controller. Users interact with models through a web browser.
Code Reference
Source Location
- Repository: Mit_han_lab_Llm_awq
- File: tinychat/serve/gradio_web_server.py
- Lines: 1-1201
Signature
def build_demo(embed_mode: bool) -> gr.Blocks: ...
def get_model_list() -> List[str]: ...
def load_demo(url_params, prompt_style_btn, request: gr.Request) -> Tuple: ...
def load_demo_refresh_model_list(prompt_style_btn, request: gr.Request) -> Tuple: ...
def get_conv_log_filename() -> str: ...
def http_bot(state, model_selector, temperature, top_p, max_new_tokens, prompt_style_btn, request: gr.Request) -> Generator: ...
def add_images(state, imagebox, imagebox_2, imagebox_3, videobox, image_process_mode, request: gr.Request): ...
def add_text(state, text, image, image_process_mode, prompt_style_btn, request: gr.Request) -> Tuple: ...
def add_text_only(state, text, request: gr.Request) -> Tuple: ...
def regenerate(state, image_process_mode, request: gr.Request) -> Tuple: ...
def clear_history(prompt_style_btn, request: gr.Request) -> Tuple: ...
def change_prompt_style(state, prompt_style_btn, request: gr.Request) -> Conversation: ...
Import
# Run as a standalone Gradio server:
# python -m tinychat.serve.gradio_web_server --host 0.0.0.0 --port 7860 --controller-url http://localhost:21001
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| embed_mode | bool | Yes | If True, suppresses title, ToS, and acknowledgement markdown blocks |
| --controller-url | str | Yes | URL of the controller service (default: http://localhost:21001) |
| --model-list-mode | str | No | "once" to load models at startup, "reload" to refresh on each page load |
| --concurrency-count | int | No | Maximum number of concurrent Gradio requests (default: 10) |
| --auto-pad-image-token | flag | No | Automatically insert <image> token before prompts when images are provided |
| --moderate | flag | No | Enable content moderation on user inputs |
| --share | flag | No | Create a public Gradio share link |
Outputs
| Name | Type | Description |
|---|---|---|
| demo | gr.Blocks | The fully configured Gradio Blocks application |
| conversation_log | JSON file | Timestamped JSON log of each conversation turn, saved to LOGDIR |
| streamed_response | Gradio chatbot updates | Real-time token-by-token updates rendered in the chat interface |
Usage Examples
Launching the Web Server
# Basic launch:
# python -m tinychat.serve.gradio_web_server --controller-url http://localhost:21001
# With auto image token padding and public share link:
# python -m tinychat.serve.gradio_web_server \
# --controller-url http://localhost:21001 \
# --auto-pad-image-token \
# --share
Embedding in Another Application
# To embed without title/ToS:
from tinychat.serve.gradio_web_server import build_demo
demo = build_demo(embed_mode=True)
demo.queue(concurrency_count=5).launch(server_name="0.0.0.0", server_port=7860)