Implementation:Predibase Lorax Lorax Launcher Main
| Knowledge Sources | |
|---|---|
| Domains | Systems_Architecture, Model_Serving |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Concrete tool for orchestrating the multi-process LoRAX inference server provided by the Rust launcher binary.
Description
The lorax-launcher binary (launcher/src/main.rs) is the top-level orchestrator for the LoRAX system. It parses CLI arguments into an Args struct, downloads/converts model weights, spawns Python gRPC shard processes (one per GPU), and launches the Rust HTTP router. It monitors all child processes and handles graceful shutdown via signal trapping.
The launcher uses the clap crate for CLI parsing and nix for Unix process management.
Usage
This is the user-facing entry point for LoRAX. Run lorax-launcher with CLI arguments to start the full inference stack.
Code Reference
Source Location
- Repository: LoRAX
- File: launcher/src/main.rs
- Lines: 275-1875
Signature
/// Args for the LoRAX launcher
#[derive(Parser, Debug)]
#[clap(author, version, about, long_about = None)]
struct Args {
/// Model ID (HuggingFace hub ID or local path)
#[clap(default_value = "bigscience/bloom-560m", long, env)]
model_id: String,
/// Number of shards (default: num GPUs)
#[clap(long, env)]
num_shard: Option<usize>,
/// HTTP port (default: 3000)
#[clap(default_value = "3000", long, short, env)]
port: u16,
/// Maximum input token length
#[clap(long, env)]
max_input_length: Option<usize>,
/// Maximum total tokens (input + output)
#[clap(long, env)]
max_total_tokens: Option<usize>,
/// Quantization method
#[clap(long, env)]
quantize: Option<Quantization>,
// ... 30+ additional parameters
}
Import
# Binary invocation, not a library import
lorax-launcher --model-id mistralai/Mistral-7B-v0.1 --port 3000
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_id | String | Yes | HuggingFace model ID or local path |
| num_shard | Option[usize] | No | Number of GPU shards (auto-detected) |
| port | u16 | No | HTTP port (default 3000) |
| max_input_length | Option[usize] | No | Max prompt tokens |
| max_total_tokens | Option[usize] | No | Max total tokens (prompt + generation) |
| quantize | Option[Quantization] | No | Quantization method |
Outputs
| Name | Type | Description |
|---|---|---|
| Running server | Process tree | Launcher + Python shard(s) + Rust router |
| HTTP endpoint | TCP socket | HTTP API bound to specified port |
Usage Examples
Basic Launch
# Launch with Mistral-7B on a single GPU
lorax-launcher \
--model-id mistralai/Mistral-7B-Instruct-v0.1 \
--port 3000 \
--max-input-length 1024 \
--max-total-tokens 2048
Multi-GPU with Quantization
# Launch with 2 GPU shards and GPTQ quantization
lorax-launcher \
--model-id TheBloke/Llama-2-70B-GPTQ \
--num-shard 2 \
--quantize gptq \
--port 8080