Implementation:Lm sys FastChat Controller Dispatch

Field	Value
Page Type	Implementation (API Doc)
Repository	lm-sys/FastChat
Domain	Distributed Systems, Load Balancing, Service Orchestration
Knowledge Sources	Source code analysis of `fastchat/serve/controller.py`
Last Updated	2026-02-07 14:00 GMT
Implements	Principle:Lm_sys_FastChat_Worker_Dispatch_Control

Overview

This page documents the Controller class and associated components that implement centralized worker dispatch in the FastChat distributed model serving architecture. The Controller maintains a registry of model workers, monitors their health via heartbeats, and routes inference requests to appropriate workers using configurable dispatch strategies.

Description

The Controller is a FastAPI-based HTTP service that acts as the central coordination point for all model workers. It tracks worker status, handles registration and deregistration, and provides worker address resolution for API servers and other clients. The dispatch logic supports two strategies: lottery (weighted random by speed) and shortest-queue (minimal queue_length/speed ratio).

The Controller is typically started as a standalone process and listens on port 21001 by default. Model workers register with it on startup and send periodic heartbeats. API servers query the controller to find a worker for a given model before forwarding inference requests.

Usage

Start the controller from the command line:

python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001 --dispatch-method shortest_queue

Use programmatically:

from fastchat.serve.controller import Controller, create_controller

# Via factory function (parses CLI args)
args, controller = create_controller()

# Direct instantiation
controller = Controller(dispatch_method="shortest_queue")

Code Reference

Source Location

Component	File	Lines
Controller class	`fastchat/serve/controller.py`	L64-283
DispatchMethod enum	`fastchat/serve/controller.py`	L34-45
WorkerInfo dataclass	`fastchat/serve/controller.py`	L48-55
create_controller factory	`fastchat/serve/controller.py`	L353-374
FastAPI endpoints	`fastchat/serve/controller.py`	L285-350

Signature

class DispatchMethod(Enum):
    LOTTERY = auto()
    SHORTEST_QUEUE = auto()

    @classmethod
    def from_str(cls, name: str) -> "DispatchMethod": ...

@dataclasses.dataclass
class WorkerInfo:
    model_names: List[str]
    speed: int
    queue_length: int
    check_heart_beat: bool
    last_heart_beat: str
    multimodal: bool

class Controller:
    def __init__(self, dispatch_method: str) -> None: ...

    def register_worker(
        self,
        worker_name: str,
        check_heart_beat: bool,
        worker_status: dict,
        multimodal: bool,
    ) -> bool: ...

    def remove_worker(self, worker_name: str) -> None: ...
    def refresh_all_workers(self) -> None: ...
    def list_models(self) -> List[str]: ...
    def list_multimodal_models(self) -> List[str]: ...
    def list_language_models(self) -> List[str]: ...
    def get_worker_address(self, model_name: str) -> str: ...
    def receive_heart_beat(self, worker_name: str, queue_length: int) -> bool: ...
    def remove_stale_workers_by_expiration(self) -> None: ...
    def worker_api_get_status(self) -> dict: ...
    def worker_api_generate_stream(self, params: dict) -> Generator[bytes, None, None]: ...

def create_controller() -> Tuple[argparse.Namespace, Controller]: ...

Import

from fastchat.serve.controller import Controller, create_controller
from fastchat.serve.controller import DispatchMethod

I/O Contract

CLI Parameters

Parameter	Type	Default	Description
`--host`	str	`"localhost"`	Host address to bind the controller server
`--port`	int	`21001`	Port number for the controller server
`--dispatch-method`	str	`"shortest_queue"`	Worker selection strategy: `"lottery"` or `"shortest_queue"`
`--ssl`	flag	`False`	Enable SSL (requires `SSL_KEYFILE` and `SSL_CERTFILE` environment variables)

FastAPI Endpoints

Method	Endpoint	Request Body	Response
POST	`/register_worker`	`{"worker_name": str, "check_heart_beat": bool, "worker_status": dict, "multimodal": bool}`	None
POST	`/refresh_all_workers`	None	None
POST	`/list_models`	None	`{"models": List[str]}`
POST	`/list_multimodal_models`	None	`{"models": List[str]}`
POST	`/list_language_models`	None	`{"models": List[str]}`
POST	`/get_worker_address`	`{"model": str}`	`{"address": str}`
POST	`/receive_heart_beat`	`{"worker_name": str, "queue_length": int}`	`{"exist": bool}`
POST	`/worker_generate_stream`	`{"model": str, ...gen_params}`	StreamingResponse (chunked bytes with `\0` delimiter)
POST	`/worker_get_status`	None	`{"model_names": List[str], "speed": int, "queue_length": int}`
GET	`/test_connection`	None	`"success"`

Dispatch Behavior

Method	Selection Logic	Side Effects
LOTTERY	Weighted random selection by worker speed: `P(worker) = speed / sum(speeds)`	None
SHORTEST_QUEUE	Select worker with minimum `queue_length / speed` ratio	Increments selected worker's `queue_length` by 1

Error Handling

If no worker is available for the requested model, get_worker_address returns an empty string "".
If a worker times out during stream proxying, the controller returns a JSON error with ErrorCode.CONTROLLER_WORKER_TIMEOUT.
If no worker is found during worker_api_generate_stream, it yields a JSON error with ErrorCode.CONTROLLER_NO_WORKER.

Usage Examples

Starting the Controller

# Start with default settings (shortest_queue dispatch on localhost:21001)
python3 -m fastchat.serve.controller

# Start with lottery dispatch on all interfaces
python3 -m fastchat.serve.controller --host 0.0.0.0 --dispatch-method lottery

# Start with SSL
SSL_KEYFILE=/path/to/key.pem SSL_CERTFILE=/path/to/cert.pem \
    python3 -m fastchat.serve.controller --ssl

Querying the Controller Programmatically

import requests

CONTROLLER_URL = "http://localhost:21001"

# List available models
response = requests.post(f"{CONTROLLER_URL}/list_models")
models = response.json()["models"]
print(f"Available models: {models}")

# Get a worker address for a specific model
response = requests.post(
    f"{CONTROLLER_URL}/get_worker_address",
    json={"model": "vicuna-7b-v1.5"}
)
worker_addr = response.json()["address"]
print(f"Worker address: {worker_addr}")

# Refresh all workers (re-probe and remove stale)
requests.post(f"{CONTROLLER_URL}/refresh_all_workers")

Related Pages

Principle:Lm_sys_FastChat_Worker_Dispatch_Control
Principle:Lm_sys_FastChat_Worker_Dispatch_Control -- The principle this implementation realizes
Implementation:Lm_sys_FastChat_ModelWorker_Load_And_Generate -- Model worker that registers with this controller
Implementation:Lm_sys_FastChat_OpenAI_API_Server -- API server that queries this controller for worker addresses
Environment:Lm_sys_FastChat_GPU_CUDA_Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment