Principle:Predibase Lorax Multi Process Server Launch
| Knowledge Sources | |
|---|---|
| Domains | Systems_Architecture, Model_Serving |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
A multi-process orchestration pattern where a parent launcher process manages Python model shards (one per GPU) communicating via gRPC, fronted by a Rust HTTP router for client requests.
Description
Multi-Process Server Launch addresses the challenge of coordinating a heterogeneous inference system. The problem: Python is needed for PyTorch model execution, but Rust provides better performance for HTTP routing and request scheduling. The solution is a three-tier process architecture:
- Launcher (Rust): Top-level orchestrator that downloads model weights, spawns shard processes, and launches the router. Handles graceful shutdown.
- Shard(s) (Python): One gRPC server per GPU running PyTorch model inference. Communicates with the router over Unix domain sockets.
- Router (Rust): HTTP/REST server that accepts client requests, manages continuous batching, and dispatches to shards via gRPC.
Usage
Use this principle when deploying LoRAX. The launcher binary (lorax-launcher) is the user-facing entry point that orchestrates the entire system. Users configure model, quantization, and scaling parameters via CLI arguments.
Theoretical Basis
Pseudo-code:
# Orchestration flow
def main(args):
download_and_convert_model(args.model_id)
shard_processes = []
for shard_id in range(args.num_shard):
proc = spawn_python_shard(shard_id, args)
shard_processes.append(proc)
wait_for_shards_ready()
router = spawn_rust_router(args)
monitor_all_processes(shard_processes, router)