Principle:OpenGVLab InternVL Distributed Worker Management

Knowledge Sources	OpenGVLab_InternVL
Domains	Serving, Distributed Systems, Infrastructure
Last Updated	2026-02-07 14:00 GMT

Overview

The distributed worker management principle defines a controller-worker architecture for horizontally scaling model inference across multiple GPU workers, with health monitoring, load balancing, and request proxying.

Description

This principle establishes a controller-worker pattern for distributed model serving:

Controller: A central FastAPI server that maintains a registry of workers, monitors their health through periodic heartbeats, and routes client requests to available workers. It supports two load-balancing strategies: lottery (weighted random by worker speed) and shortest_queue (routes to worker with lowest queue-to-speed ratio).

Workers: Individual model serving processes that load a model, register with the controller, send periodic heartbeats reporting queue length and speed, and process inference requests with streaming responses.

Health monitoring: Workers send heartbeats at regular intervals. Workers that miss the heartbeat deadline are automatically removed from the registry. If a worker discovers it has been deregistered, it re-registers.

Request proxying: The controller proxies streaming generation requests to the appropriate worker, handling errors gracefully and returning standardized error responses.

Hierarchical management: The controller can itself act as a worker, enabling multi-level hierarchies that connect isolated sub-networks.

This architecture enables horizontal scaling of model inference without modifying the client interface.

Usage

Apply this principle when deploying models across multiple GPUs or machines, where a centralized coordination service is needed for load balancing and health monitoring.

Theoretical Basis

This follows standard distributed systems patterns: service registry, health checking, and load balancing. The controller acts as a reverse proxy with built-in service discovery, similar to patterns used in microservice architectures.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment