Implementation:Predibase Lorax Lorax Launcher Main

Knowledge Sources	LoRAX LoRAX Launcher CLI
Domains	Systems_Architecture, Model_Serving
Last Updated	2026-02-08 02:00 GMT

Overview

Concrete tool for orchestrating the multi-process LoRAX inference server provided by the Rust launcher binary.

Description

The lorax-launcher binary (launcher/src/main.rs) is the top-level orchestrator for the LoRAX system. It parses CLI arguments into an Args struct, downloads/converts model weights, spawns Python gRPC shard processes (one per GPU), and launches the Rust HTTP router. It monitors all child processes and handles graceful shutdown via signal trapping.

The launcher uses the clap crate for CLI parsing and nix for Unix process management.

Usage

This is the user-facing entry point for LoRAX. Run lorax-launcher with CLI arguments to start the full inference stack.

Code Reference

Source Location

Repository: LoRAX
File: launcher/src/main.rs
Lines: 275-1875

Signature

/// Args for the LoRAX launcher
#[derive(Parser, Debug)]
#[clap(author, version, about, long_about = None)]
struct Args {
    /// Model ID (HuggingFace hub ID or local path)
    #[clap(default_value = "bigscience/bloom-560m", long, env)]
    model_id: String,

    /// Number of shards (default: num GPUs)
    #[clap(long, env)]
    num_shard: Option<usize>,

    /// HTTP port (default: 3000)
    #[clap(default_value = "3000", long, short, env)]
    port: u16,

    /// Maximum input token length
    #[clap(long, env)]
    max_input_length: Option<usize>,

    /// Maximum total tokens (input + output)
    #[clap(long, env)]
    max_total_tokens: Option<usize>,

    /// Quantization method
    #[clap(long, env)]
    quantize: Option<Quantization>,
    // ... 30+ additional parameters
}

Import

# Binary invocation, not a library import
lorax-launcher --model-id mistralai/Mistral-7B-v0.1 --port 3000

I/O Contract

Inputs

Name	Type	Required	Description
model_id	String	Yes	HuggingFace model ID or local path
num_shard	Option[usize]	No	Number of GPU shards (auto-detected)
port	u16	No	HTTP port (default 3000)
max_input_length	Option[usize]	No	Max prompt tokens
max_total_tokens	Option[usize]	No	Max total tokens (prompt + generation)
quantize	Option[Quantization]	No	Quantization method

Outputs

Name	Type	Description
Running server	Process tree	Launcher + Python shard(s) + Rust router
HTTP endpoint	TCP socket	HTTP API bound to specified port

Usage Examples

Basic Launch

# Launch with Mistral-7B on a single GPU
lorax-launcher \
  --model-id mistralai/Mistral-7B-Instruct-v0.1 \
  --port 3000 \
  --max-input-length 1024 \
  --max-total-tokens 2048

Multi-GPU with Quantization

# Launch with 2 GPU shards and GPTQ quantization
lorax-launcher \
  --model-id TheBloke/Llama-2-70B-GPTQ \
  --num-shard 2 \
  --quantize gptq \
  --port 8080

Related Pages

Implements Principle

Principle:Predibase_Lorax_Multi_Process_Server_Launch

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment