Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Predibase Lorax Lorax Launcher Main

From Leeroopedia


Knowledge Sources
Domains Systems_Architecture, Model_Serving
Last Updated 2026-02-08 02:00 GMT

Overview

Concrete tool for orchestrating the multi-process LoRAX inference server provided by the Rust launcher binary.

Description

The lorax-launcher binary (launcher/src/main.rs) is the top-level orchestrator for the LoRAX system. It parses CLI arguments into an Args struct, downloads/converts model weights, spawns Python gRPC shard processes (one per GPU), and launches the Rust HTTP router. It monitors all child processes and handles graceful shutdown via signal trapping.

The launcher uses the clap crate for CLI parsing and nix for Unix process management.

Usage

This is the user-facing entry point for LoRAX. Run lorax-launcher with CLI arguments to start the full inference stack.

Code Reference

Source Location

  • Repository: LoRAX
  • File: launcher/src/main.rs
  • Lines: 275-1875

Signature

/// Args for the LoRAX launcher
#[derive(Parser, Debug)]
#[clap(author, version, about, long_about = None)]
struct Args {
    /// Model ID (HuggingFace hub ID or local path)
    #[clap(default_value = "bigscience/bloom-560m", long, env)]
    model_id: String,

    /// Number of shards (default: num GPUs)
    #[clap(long, env)]
    num_shard: Option<usize>,

    /// HTTP port (default: 3000)
    #[clap(default_value = "3000", long, short, env)]
    port: u16,

    /// Maximum input token length
    #[clap(long, env)]
    max_input_length: Option<usize>,

    /// Maximum total tokens (input + output)
    #[clap(long, env)]
    max_total_tokens: Option<usize>,

    /// Quantization method
    #[clap(long, env)]
    quantize: Option<Quantization>,
    // ... 30+ additional parameters
}

Import

# Binary invocation, not a library import
lorax-launcher --model-id mistralai/Mistral-7B-v0.1 --port 3000

I/O Contract

Inputs

Name Type Required Description
model_id String Yes HuggingFace model ID or local path
num_shard Option[usize] No Number of GPU shards (auto-detected)
port u16 No HTTP port (default 3000)
max_input_length Option[usize] No Max prompt tokens
max_total_tokens Option[usize] No Max total tokens (prompt + generation)
quantize Option[Quantization] No Quantization method

Outputs

Name Type Description
Running server Process tree Launcher + Python shard(s) + Rust router
HTTP endpoint TCP socket HTTP API bound to specified port

Usage Examples

Basic Launch

# Launch with Mistral-7B on a single GPU
lorax-launcher \
  --model-id mistralai/Mistral-7B-Instruct-v0.1 \
  --port 3000 \
  --max-input-length 1024 \
  --max-total-tokens 2048

Multi-GPU with Quantization

# Launch with 2 GPU shards and GPTQ quantization
lorax-launcher \
  --model-id TheBloke/Llama-2-70B-GPTQ \
  --num-shard 2 \
  --quantize gptq \
  --port 8080

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment