Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq Gradio Web Server

From Leeroopedia
Knowledge Sources
Domains Serving, UI
Last Updated 2026-02-15 00:00 GMT

Overview

The Gradio Web Server provides a browser-based chat interface for TinyChat, enabling interactive multimodal conversations with LLaVA/VILA models served through the distributed worker infrastructure.

Description

This module implements a full-featured Gradio web UI for the TinyChat serving stack. The build_demo(embed_mode) function constructs the Gradio Blocks application, which includes up to IMAGE_BOX_NUM (3) image upload slots, a video upload slot (extracting up to 8 frames), a text input box, model selection dropdown, prompt style radio buttons ("default" / "no-sys"), and tunable generation parameters (temperature, top_p, max output tokens). The UI supports BUTTON_LIST_LEN (2) action buttons for regeneration and clearing history. The get_model_list() function queries the controller to discover available models. The load_demo() and load_demo_refresh_model_list() functions initialize conversation state and model selection on page load. The http_bot() generator function handles the core chat loop: it resolves the conversation template based on the model name, queries the controller for a worker address, constructs the prompt with image token placeholders, streams the response from the worker, and logs the conversation to a timestamped JSON file via get_conv_log_filename(). Helper functions add_images(), add_text(), add_text_only(), regenerate(), clear_history(), and change_prompt_style() manage conversation state transitions. The server supports content moderation and automatic image token padding via command-line flags.

Usage

Run this module as a standalone Gradio application that connects to a running controller. Users interact with models through a web browser.

Code Reference

Source Location

Signature

def build_demo(embed_mode: bool) -> gr.Blocks: ...
def get_model_list() -> List[str]: ...
def load_demo(url_params, prompt_style_btn, request: gr.Request) -> Tuple: ...
def load_demo_refresh_model_list(prompt_style_btn, request: gr.Request) -> Tuple: ...
def get_conv_log_filename() -> str: ...
def http_bot(state, model_selector, temperature, top_p, max_new_tokens, prompt_style_btn, request: gr.Request) -> Generator: ...
def add_images(state, imagebox, imagebox_2, imagebox_3, videobox, image_process_mode, request: gr.Request): ...
def add_text(state, text, image, image_process_mode, prompt_style_btn, request: gr.Request) -> Tuple: ...
def add_text_only(state, text, request: gr.Request) -> Tuple: ...
def regenerate(state, image_process_mode, request: gr.Request) -> Tuple: ...
def clear_history(prompt_style_btn, request: gr.Request) -> Tuple: ...
def change_prompt_style(state, prompt_style_btn, request: gr.Request) -> Conversation: ...

Import

# Run as a standalone Gradio server:
# python -m tinychat.serve.gradio_web_server --host 0.0.0.0 --port 7860 --controller-url http://localhost:21001

I/O Contract

Inputs

Name Type Required Description
embed_mode bool Yes If True, suppresses title, ToS, and acknowledgement markdown blocks
--controller-url str Yes URL of the controller service (default: http://localhost:21001)
--model-list-mode str No "once" to load models at startup, "reload" to refresh on each page load
--concurrency-count int No Maximum number of concurrent Gradio requests (default: 10)
--auto-pad-image-token flag No Automatically insert <image> token before prompts when images are provided
--moderate flag No Enable content moderation on user inputs
--share flag No Create a public Gradio share link

Outputs

Name Type Description
demo gr.Blocks The fully configured Gradio Blocks application
conversation_log JSON file Timestamped JSON log of each conversation turn, saved to LOGDIR
streamed_response Gradio chatbot updates Real-time token-by-token updates rendered in the chat interface

Usage Examples

Launching the Web Server

# Basic launch:
# python -m tinychat.serve.gradio_web_server --controller-url http://localhost:21001

# With auto image token padding and public share link:
# python -m tinychat.serve.gradio_web_server \
#     --controller-url http://localhost:21001 \
#     --auto-pad-image-token \
#     --share

Embedding in Another Application

# To embed without title/ToS:
from tinychat.serve.gradio_web_server import build_demo
demo = build_demo(embed_mode=True)
demo.queue(concurrency_count=5).launch(server_name="0.0.0.0", server_port=7860)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment