Principle:Predibase Lorax Multi Process Server Launch

Knowledge Sources	LoRAX Launcher Reference LoRAX Architecture
Domains	Systems_Architecture, Model_Serving
Last Updated	2026-02-08 02:00 GMT

Overview

A multi-process orchestration pattern where a parent launcher process manages Python model shards (one per GPU) communicating via gRPC, fronted by a Rust HTTP router for client requests.

Description

Multi-Process Server Launch addresses the challenge of coordinating a heterogeneous inference system. The problem: Python is needed for PyTorch model execution, but Rust provides better performance for HTTP routing and request scheduling. The solution is a three-tier process architecture:

Launcher (Rust): Top-level orchestrator that downloads model weights, spawns shard processes, and launches the router. Handles graceful shutdown.
Shard(s) (Python): One gRPC server per GPU running PyTorch model inference. Communicates with the router over Unix domain sockets.
Router (Rust): HTTP/REST server that accepts client requests, manages continuous batching, and dispatches to shards via gRPC.

Usage

Use this principle when deploying LoRAX. The launcher binary (lorax-launcher) is the user-facing entry point that orchestrates the entire system. Users configure model, quantization, and scaling parameters via CLI arguments.

Theoretical Basis

Pseudo-code:

# Orchestration flow
def main(args):
    download_and_convert_model(args.model_id)
    shard_processes = []
    for shard_id in range(args.num_shard):
        proc = spawn_python_shard(shard_id, args)
        shard_processes.append(proc)
    wait_for_shards_ready()
    router = spawn_rust_router(args)
    monitor_all_processes(shard_processes, router)

Related Pages

Implemented By

Implementation:Predibase_Lorax_Lorax_Launcher_Main

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment