Implementation:Lm sys FastChat Controller Dispatch
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Repository | lm-sys/FastChat |
| Domain | Distributed Systems, Load Balancing, Service Orchestration |
| Knowledge Sources | Source code analysis of fastchat/serve/controller.py
|
| Last Updated | 2026-02-07 14:00 GMT |
| Implements | Principle:Lm_sys_FastChat_Worker_Dispatch_Control |
Overview
This page documents the Controller class and associated components that implement centralized worker dispatch in the FastChat distributed model serving architecture. The Controller maintains a registry of model workers, monitors their health via heartbeats, and routes inference requests to appropriate workers using configurable dispatch strategies.
Description
The Controller is a FastAPI-based HTTP service that acts as the central coordination point for all model workers. It tracks worker status, handles registration and deregistration, and provides worker address resolution for API servers and other clients. The dispatch logic supports two strategies: lottery (weighted random by speed) and shortest-queue (minimal queue_length/speed ratio).
The Controller is typically started as a standalone process and listens on port 21001 by default. Model workers register with it on startup and send periodic heartbeats. API servers query the controller to find a worker for a given model before forwarding inference requests.
Usage
Start the controller from the command line:
python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001 --dispatch-method shortest_queue
Use programmatically:
from fastchat.serve.controller import Controller, create_controller
# Via factory function (parses CLI args)
args, controller = create_controller()
# Direct instantiation
controller = Controller(dispatch_method="shortest_queue")
Code Reference
Source Location
| Component | File | Lines |
|---|---|---|
| Controller class | fastchat/serve/controller.py |
L64-283 |
| DispatchMethod enum | fastchat/serve/controller.py |
L34-45 |
| WorkerInfo dataclass | fastchat/serve/controller.py |
L48-55 |
| create_controller factory | fastchat/serve/controller.py |
L353-374 |
| FastAPI endpoints | fastchat/serve/controller.py |
L285-350 |
Signature
class DispatchMethod(Enum):
LOTTERY = auto()
SHORTEST_QUEUE = auto()
@classmethod
def from_str(cls, name: str) -> "DispatchMethod": ...
@dataclasses.dataclass
class WorkerInfo:
model_names: List[str]
speed: int
queue_length: int
check_heart_beat: bool
last_heart_beat: str
multimodal: bool
class Controller:
def __init__(self, dispatch_method: str) -> None: ...
def register_worker(
self,
worker_name: str,
check_heart_beat: bool,
worker_status: dict,
multimodal: bool,
) -> bool: ...
def remove_worker(self, worker_name: str) -> None: ...
def refresh_all_workers(self) -> None: ...
def list_models(self) -> List[str]: ...
def list_multimodal_models(self) -> List[str]: ...
def list_language_models(self) -> List[str]: ...
def get_worker_address(self, model_name: str) -> str: ...
def receive_heart_beat(self, worker_name: str, queue_length: int) -> bool: ...
def remove_stale_workers_by_expiration(self) -> None: ...
def worker_api_get_status(self) -> dict: ...
def worker_api_generate_stream(self, params: dict) -> Generator[bytes, None, None]: ...
def create_controller() -> Tuple[argparse.Namespace, Controller]: ...
Import
from fastchat.serve.controller import Controller, create_controller
from fastchat.serve.controller import DispatchMethod
I/O Contract
CLI Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--host |
str | "localhost" |
Host address to bind the controller server |
--port |
int | 21001 |
Port number for the controller server |
--dispatch-method |
str | "shortest_queue" |
Worker selection strategy: "lottery" or "shortest_queue"
|
--ssl |
flag | False |
Enable SSL (requires SSL_KEYFILE and SSL_CERTFILE environment variables)
|
FastAPI Endpoints
| Method | Endpoint | Request Body | Response |
|---|---|---|---|
| POST | /register_worker |
{"worker_name": str, "check_heart_beat": bool, "worker_status": dict, "multimodal": bool} |
None |
| POST | /refresh_all_workers |
None | None |
| POST | /list_models |
None | {"models": List[str]}
|
| POST | /list_multimodal_models |
None | {"models": List[str]}
|
| POST | /list_language_models |
None | {"models": List[str]}
|
| POST | /get_worker_address |
{"model": str} |
{"address": str}
|
| POST | /receive_heart_beat |
{"worker_name": str, "queue_length": int} |
{"exist": bool}
|
| POST | /worker_generate_stream |
{"model": str, ...gen_params} |
StreamingResponse (chunked bytes with \0 delimiter)
|
| POST | /worker_get_status |
None | {"model_names": List[str], "speed": int, "queue_length": int}
|
| GET | /test_connection |
None | "success"
|
Dispatch Behavior
| Method | Selection Logic | Side Effects |
|---|---|---|
| LOTTERY | Weighted random selection by worker speed: P(worker) = speed / sum(speeds) |
None |
| SHORTEST_QUEUE | Select worker with minimum queue_length / speed ratio |
Increments selected worker's queue_length by 1
|
Error Handling
- If no worker is available for the requested model,
get_worker_addressreturns an empty string"". - If a worker times out during stream proxying, the controller returns a JSON error with
ErrorCode.CONTROLLER_WORKER_TIMEOUT. - If no worker is found during
worker_api_generate_stream, it yields a JSON error withErrorCode.CONTROLLER_NO_WORKER.
Usage Examples
Starting the Controller
# Start with default settings (shortest_queue dispatch on localhost:21001)
python3 -m fastchat.serve.controller
# Start with lottery dispatch on all interfaces
python3 -m fastchat.serve.controller --host 0.0.0.0 --dispatch-method lottery
# Start with SSL
SSL_KEYFILE=/path/to/key.pem SSL_CERTFILE=/path/to/cert.pem \
python3 -m fastchat.serve.controller --ssl
Querying the Controller Programmatically
import requests
CONTROLLER_URL = "http://localhost:21001"
# List available models
response = requests.post(f"{CONTROLLER_URL}/list_models")
models = response.json()["models"]
print(f"Available models: {models}")
# Get a worker address for a specific model
response = requests.post(
f"{CONTROLLER_URL}/get_worker_address",
json={"model": "vicuna-7b-v1.5"}
)
worker_addr = response.json()["address"]
print(f"Worker address: {worker_addr}")
# Refresh all workers (re-probe and remove stale)
requests.post(f"{CONTROLLER_URL}/refresh_all_workers")
Related Pages
- Principle:Lm_sys_FastChat_Worker_Dispatch_Control
- Principle:Lm_sys_FastChat_Worker_Dispatch_Control -- The principle this implementation realizes
- Implementation:Lm_sys_FastChat_ModelWorker_Load_And_Generate -- Model worker that registers with this controller
- Implementation:Lm_sys_FastChat_OpenAI_API_Server -- API server that queries this controller for worker addresses
- Environment:Lm_sys_FastChat_GPU_CUDA_Inference