Principle:OpenGVLab InternVL Distributed Worker Management
| Knowledge Sources | |
|---|---|
| Domains | Serving, Distributed Systems, Infrastructure |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
The distributed worker management principle defines a controller-worker architecture for horizontally scaling model inference across multiple GPU workers, with health monitoring, load balancing, and request proxying.
Description
This principle establishes a controller-worker pattern for distributed model serving:
- Controller: A central FastAPI server that maintains a registry of workers, monitors their health through periodic heartbeats, and routes client requests to available workers. It supports two load-balancing strategies: lottery (weighted random by worker speed) and shortest_queue (routes to worker with lowest queue-to-speed ratio).
- Workers: Individual model serving processes that load a model, register with the controller, send periodic heartbeats reporting queue length and speed, and process inference requests with streaming responses.
- Health monitoring: Workers send heartbeats at regular intervals. Workers that miss the heartbeat deadline are automatically removed from the registry. If a worker discovers it has been deregistered, it re-registers.
- Request proxying: The controller proxies streaming generation requests to the appropriate worker, handling errors gracefully and returning standardized error responses.
- Hierarchical management: The controller can itself act as a worker, enabling multi-level hierarchies that connect isolated sub-networks.
This architecture enables horizontal scaling of model inference without modifying the client interface.
Usage
Apply this principle when deploying models across multiple GPUs or machines, where a centralized coordination service is needed for load balancing and health monitoring.
Theoretical Basis
This follows standard distributed systems patterns: service registry, health checking, and load balancing. The controller acts as a reverse proxy with built-in service discovery, similar to patterns used in microservice architectures.