Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Predibase Lorax Docker Container Runtime

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Containers
Last Updated 2026-02-08 02:30 GMT

Overview

Docker container runtime with NVIDIA GPU support, based on `nvidia/cuda:12.4.0-base-ubuntu22.04`, providing the complete LoRAX inference server with all pre-compiled CUDA kernels.

Description

The official LoRAX Docker image packages the entire inference stack including the Rust router/launcher, Python gRPC server, all pre-compiled CUDA kernels (ExLLaMA, Punica, vLLM, EETQ), and the Python dependency tree. The multi-stage Dockerfile uses `cargo-chef` for Rust build caching and a CUDA devel image for kernel compilation, producing a slim runtime image based on `nvidia/cuda:12.4.0-base-ubuntu22.04`.

The container entrypoint (`container-entrypoint.sh`) handles:

  • Trapping SIGTERM/SIGKILL for graceful shutdown and model upload
  • Optional S3 model sync before server launch
  • Launching `lorax-launcher` which orchestrates the router and server shards

Usage

Use this environment for production deployment and local development of LoRAX. The Docker image is the recommended deployment method as it includes all pre-compiled CUDA kernels that are difficult to build manually. Alternative deployment targets include Kubernetes (via Helm chart) and SageMaker (via `sagemaker-entrypoint.sh`).

System Requirements

Category Requirement Notes
Docker Docker 20.10+ With BuildKit support
NVIDIA Container Toolkit nvidia-docker2 or nvidia-container-toolkit Required for GPU passthrough
NVIDIA Driver 550+ Compatible with CUDA 12.4 runtime
Disk 20GB+ Docker image is ~15GB with all CUDA kernels

Dependencies

Base Images

  • `lukemathwalker/cargo-chef:latest-rust-1.83` (Rust build stage)
  • `nvidia/cuda:12.4.0-devel-ubuntu22.04` (Python/kernel build stage)
  • `nvidia/cuda:12.4.0-base-ubuntu22.04` (Runtime stage)

Build Tools (compile stage only)

  • Rust 1.83 toolchain
  • `protoc` (Protocol Buffer compiler)
  • `ninja-build`, `cmake` >= 3.30.0
  • Python 3.10 build environment

Runtime Tools

  • `aws-cli` (for S3 model sync)
  • `lorax-launcher` (Rust binary, process orchestrator)
  • `lorax-router` (Rust binary, HTTP/gRPC router)
  • `lorax-server` (Python gRPC server)

Credentials

The following environment variables must be set at container runtime:

  • `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`: HuggingFace API token for downloading gated models
  • `PREDIBASE_API_TOKEN`: Predibase platform API token (if using Predibase-hosted adapters)
  • `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`: For S3 model source (if loading from S3)
  • `PREDIBASE_MODEL_BUCKET`: S3 bucket for Predibase model storage

Quick Install

# Pull the official image
docker pull ghcr.io/predibase/lorax:latest

# Run with GPU access
docker run --gpus all -p 8080:80 \
  -e MODEL_ID=meta-llama/Llama-2-7b-hf \
  -e HF_TOKEN=$HF_TOKEN \
  ghcr.io/predibase/lorax:latest

# Build from source
DOCKER_BUILDKIT=1 docker build -t lorax:custom .

Code Evidence

Container entrypoint signal handling from `container-entrypoint.sh:3-7`:

#!/bin/bash
# Trap SIGTERM and SIGKILL for model upload
trap upload SIGTERM SIGKILL

function upload() {
    # Handle graceful shutdown with model state preservation
}

Launcher token injection from `launcher/src/main.rs:907`:

// Set HF token as env var for server shards
envs.push(("HUGGING_FACE_HUB_TOKEN".into(), api_token.into()));

CUDA architecture targets in Dockerfile:

# ExLLaMA kernels: Ampere+
ENV TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX"

# vLLM kernels: Turing through Hopper
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"

Common Errors

Error Message Cause Solution
`nvidia-container-cli: initialization error` NVIDIA Container Toolkit not installed Install nvidia-container-toolkit package
`CUDA error: no kernel image is available for execution on the device` GPU architecture not in TORCH_CUDA_ARCH_LIST Rebuild Docker image with your GPU arch or use compatible GPU
`lorax-launcher: command not found` Binary not in PATH Ensure container was built correctly; check /usr/local/bin/
S3 sync timeout Network issue or missing AWS credentials Check AWS_ACCESS_KEY_ID and network connectivity

Compatibility Notes

  • GPU Architecture: Pre-built image supports SM 7.0 through SM 9.0+ (Volta through Hopper). Older GPUs (Pascal, SM 6.x) are not supported.
  • ARM64: Not officially supported. Dockerfile targets x86_64.
  • Kubernetes: Helm chart available at `charts/lorax/` for K8s deployment.
  • SageMaker: Separate entrypoint at `sagemaker-entrypoint.sh` for AWS SageMaker deployment.
  • Image Size: ~15GB due to CUDA toolkit and pre-compiled kernels.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment