Implementation:Haotian liu LLaVA Controller Class

Overview

Concrete tool for managing distributed model workers via a FastAPI-based controller server. The Controller class provides centralized worker management with HTTP endpoints for registration, heartbeat, model listing, and inference routing.

Source

File: llava/serve/controller.py
Lines: L57-171 (Controller class), L239-298 (FastAPI routes + main)

Signature

class Controller:
    def __init__(self, dispatch_method: str):
        # dispatch_method: 'lottery' or 'shortest_queue'

    def register_worker(self, worker_name: str, check_heart_beat: bool, worker_status: dict) -> bool:
        """Register a worker and optionally start heartbeat monitoring."""

    def get_worker_address(self, model_name: str) -> str:
        """Return the address of a worker serving the given model using the configured dispatch method."""

    def list_models(self) -> List[str]:
        """Aggregate and return all model names served by registered workers."""

    def remove_worker(self, worker_name: str):
        """Remove a worker from the registry (called on heartbeat expiration)."""

    def refresh_all_workers(self):
        """Ping all registered workers and remove unresponsive ones."""

CLI Usage

python -m llava.serve.controller \
    --host localhost \
    --port 21001 \
    --dispatch-method shortest_queue

FastAPI Endpoints

Endpoint	Method	Description
`/register_worker`	POST	Register a new worker or update an existing one
`/list_models`	POST	Return list of all available model names
`/get_worker_address`	POST	Get a worker address for a given model (uses dispatch method)
`/receive_heart_beat`	POST	Receive heartbeat from a worker, reset expiration timer
`/worker_generate_stream`	POST	Proxy a streaming generation request to the selected worker

Import

from llava.serve.controller import Controller

Inputs

None (standalone server). The controller is configured via CLI arguments at launch time.

Outputs

Running HTTP server on {host}:{port} that manages worker registration, dispatch, and inference proxying.

Description

The Controller class is the central coordinator in LLaVA's distributed serving architecture. It uses uvicorn as the ASGI server and exposes a FastAPI application.

Key behaviors:

On /register_worker, the controller stores the worker address, its status (including speed and model names), and optionally starts a heartbeat monitoring thread.
The heartbeat thread checks every 30 seconds whether the worker has sent a heartbeat within the last 90 seconds. If not, the worker is removed.
On /get_worker_address, the controller selects a worker based on the configured dispatch method:
- lottery -- Random selection weighted by worker speed.
- shortest_queue -- Selects the worker with the smallest queue length.
On /worker_generate_stream, the controller resolves a worker address and proxies the streaming inference request.

Metadata

Field	Value
Knowledge Sources	Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains	Distributed_Systems, Model_Serving
Last Updated	2026-02-13 14:00 GMT

Related Pages

implements Principle:Haotian_liu_LLaVA_Distributed_Worker_Control

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment