Implementation:Haotian liu LLaVA Controller Class
Overview
Concrete tool for managing distributed model workers via a FastAPI-based controller server. The Controller class provides centralized worker management with HTTP endpoints for registration, heartbeat, model listing, and inference routing.
Source
- File:
llava/serve/controller.py - Lines: L57-171 (Controller class), L239-298 (FastAPI routes + main)
Signature
class Controller:
def __init__(self, dispatch_method: str):
# dispatch_method: 'lottery' or 'shortest_queue'
def register_worker(self, worker_name: str, check_heart_beat: bool, worker_status: dict) -> bool:
"""Register a worker and optionally start heartbeat monitoring."""
def get_worker_address(self, model_name: str) -> str:
"""Return the address of a worker serving the given model using the configured dispatch method."""
def list_models(self) -> List[str]:
"""Aggregate and return all model names served by registered workers."""
def remove_worker(self, worker_name: str):
"""Remove a worker from the registry (called on heartbeat expiration)."""
def refresh_all_workers(self):
"""Ping all registered workers and remove unresponsive ones."""
CLI Usage
python -m llava.serve.controller \
--host localhost \
--port 21001 \
--dispatch-method shortest_queue
FastAPI Endpoints
| Endpoint | Method | Description |
|---|---|---|
/register_worker |
POST | Register a new worker or update an existing one |
/list_models |
POST | Return list of all available model names |
/get_worker_address |
POST | Get a worker address for a given model (uses dispatch method) |
/receive_heart_beat |
POST | Receive heartbeat from a worker, reset expiration timer |
/worker_generate_stream |
POST | Proxy a streaming generation request to the selected worker |
Import
from llava.serve.controller import Controller
Inputs
None (standalone server). The controller is configured via CLI arguments at launch time.
Outputs
Running HTTP server on {host}:{port} that manages worker registration, dispatch, and inference proxying.
Description
The Controller class is the central coordinator in LLaVA's distributed serving architecture. It uses uvicorn as the ASGI server and exposes a FastAPI application.
Key behaviors:
- On
/register_worker, the controller stores the worker address, its status (including speed and model names), and optionally starts a heartbeat monitoring thread. - The heartbeat thread checks every 30 seconds whether the worker has sent a heartbeat within the last 90 seconds. If not, the worker is removed.
- On
/get_worker_address, the controller selects a worker based on the configured dispatch method:- lottery -- Random selection weighted by worker speed.
- shortest_queue -- Selects the worker with the smallest queue length.
- On
/worker_generate_stream, the controller resolves a worker address and proxies the streaming inference request.
Metadata
| Field | Value |
|---|---|
| Knowledge Sources | Repo - LLaVA - https://github.com/haotian-liu/LLaVA |
| Domains | Distributed_Systems, Model_Serving |
| Last Updated | 2026-02-13 14:00 GMT |