Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lm sys FastChat Controller Dispatch

From Leeroopedia


Field Value
Page Type Implementation (API Doc)
Repository lm-sys/FastChat
Domain Distributed Systems, Load Balancing, Service Orchestration
Knowledge Sources Source code analysis of fastchat/serve/controller.py
Last Updated 2026-02-07 14:00 GMT
Implements Principle:Lm_sys_FastChat_Worker_Dispatch_Control

Overview

This page documents the Controller class and associated components that implement centralized worker dispatch in the FastChat distributed model serving architecture. The Controller maintains a registry of model workers, monitors their health via heartbeats, and routes inference requests to appropriate workers using configurable dispatch strategies.

Description

The Controller is a FastAPI-based HTTP service that acts as the central coordination point for all model workers. It tracks worker status, handles registration and deregistration, and provides worker address resolution for API servers and other clients. The dispatch logic supports two strategies: lottery (weighted random by speed) and shortest-queue (minimal queue_length/speed ratio).

The Controller is typically started as a standalone process and listens on port 21001 by default. Model workers register with it on startup and send periodic heartbeats. API servers query the controller to find a worker for a given model before forwarding inference requests.

Usage

Start the controller from the command line:

python3 -m fastchat.serve.controller --host 0.0.0.0 --port 21001 --dispatch-method shortest_queue

Use programmatically:

from fastchat.serve.controller import Controller, create_controller

# Via factory function (parses CLI args)
args, controller = create_controller()

# Direct instantiation
controller = Controller(dispatch_method="shortest_queue")

Code Reference

Source Location

Component File Lines
Controller class fastchat/serve/controller.py L64-283
DispatchMethod enum fastchat/serve/controller.py L34-45
WorkerInfo dataclass fastchat/serve/controller.py L48-55
create_controller factory fastchat/serve/controller.py L353-374
FastAPI endpoints fastchat/serve/controller.py L285-350

Signature

class DispatchMethod(Enum):
    LOTTERY = auto()
    SHORTEST_QUEUE = auto()

    @classmethod
    def from_str(cls, name: str) -> "DispatchMethod": ...

@dataclasses.dataclass
class WorkerInfo:
    model_names: List[str]
    speed: int
    queue_length: int
    check_heart_beat: bool
    last_heart_beat: str
    multimodal: bool

class Controller:
    def __init__(self, dispatch_method: str) -> None: ...

    def register_worker(
        self,
        worker_name: str,
        check_heart_beat: bool,
        worker_status: dict,
        multimodal: bool,
    ) -> bool: ...

    def remove_worker(self, worker_name: str) -> None: ...
    def refresh_all_workers(self) -> None: ...
    def list_models(self) -> List[str]: ...
    def list_multimodal_models(self) -> List[str]: ...
    def list_language_models(self) -> List[str]: ...
    def get_worker_address(self, model_name: str) -> str: ...
    def receive_heart_beat(self, worker_name: str, queue_length: int) -> bool: ...
    def remove_stale_workers_by_expiration(self) -> None: ...
    def worker_api_get_status(self) -> dict: ...
    def worker_api_generate_stream(self, params: dict) -> Generator[bytes, None, None]: ...

def create_controller() -> Tuple[argparse.Namespace, Controller]: ...

Import

from fastchat.serve.controller import Controller, create_controller
from fastchat.serve.controller import DispatchMethod

I/O Contract

CLI Parameters

Parameter Type Default Description
--host str "localhost" Host address to bind the controller server
--port int 21001 Port number for the controller server
--dispatch-method str "shortest_queue" Worker selection strategy: "lottery" or "shortest_queue"
--ssl flag False Enable SSL (requires SSL_KEYFILE and SSL_CERTFILE environment variables)

FastAPI Endpoints

Method Endpoint Request Body Response
POST /register_worker {"worker_name": str, "check_heart_beat": bool, "worker_status": dict, "multimodal": bool} None
POST /refresh_all_workers None None
POST /list_models None {"models": List[str]}
POST /list_multimodal_models None {"models": List[str]}
POST /list_language_models None {"models": List[str]}
POST /get_worker_address {"model": str} {"address": str}
POST /receive_heart_beat {"worker_name": str, "queue_length": int} {"exist": bool}
POST /worker_generate_stream {"model": str, ...gen_params} StreamingResponse (chunked bytes with \0 delimiter)
POST /worker_get_status None {"model_names": List[str], "speed": int, "queue_length": int}
GET /test_connection None "success"

Dispatch Behavior

Method Selection Logic Side Effects
LOTTERY Weighted random selection by worker speed: P(worker) = speed / sum(speeds) None
SHORTEST_QUEUE Select worker with minimum queue_length / speed ratio Increments selected worker's queue_length by 1

Error Handling

  • If no worker is available for the requested model, get_worker_address returns an empty string "".
  • If a worker times out during stream proxying, the controller returns a JSON error with ErrorCode.CONTROLLER_WORKER_TIMEOUT.
  • If no worker is found during worker_api_generate_stream, it yields a JSON error with ErrorCode.CONTROLLER_NO_WORKER.

Usage Examples

Starting the Controller

# Start with default settings (shortest_queue dispatch on localhost:21001)
python3 -m fastchat.serve.controller

# Start with lottery dispatch on all interfaces
python3 -m fastchat.serve.controller --host 0.0.0.0 --dispatch-method lottery

# Start with SSL
SSL_KEYFILE=/path/to/key.pem SSL_CERTFILE=/path/to/cert.pem \
    python3 -m fastchat.serve.controller --ssl

Querying the Controller Programmatically

import requests

CONTROLLER_URL = "http://localhost:21001"

# List available models
response = requests.post(f"{CONTROLLER_URL}/list_models")
models = response.json()["models"]
print(f"Available models: {models}")

# Get a worker address for a specific model
response = requests.post(
    f"{CONTROLLER_URL}/get_worker_address",
    json={"model": "vicuna-7b-v1.5"}
)
worker_addr = response.json()["address"]
print(f"Worker address: {worker_addr}")

# Refresh all workers (re-probe and remove stale)
requests.post(f"{CONTROLLER_URL}/refresh_all_workers")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment