Environment:Predibase Lorax Docker Container Runtime

Knowledge Sources	Predibase LoRAX NVIDIA Container Toolkit
Domains	Infrastructure, Containers
Last Updated	2026-02-08 02:30 GMT

Overview

Docker container runtime with NVIDIA GPU support, based on `nvidia/cuda:12.4.0-base-ubuntu22.04`, providing the complete LoRAX inference server with all pre-compiled CUDA kernels.

Description

The official LoRAX Docker image packages the entire inference stack including the Rust router/launcher, Python gRPC server, all pre-compiled CUDA kernels (ExLLaMA, Punica, vLLM, EETQ), and the Python dependency tree. The multi-stage Dockerfile uses `cargo-chef` for Rust build caching and a CUDA devel image for kernel compilation, producing a slim runtime image based on `nvidia/cuda:12.4.0-base-ubuntu22.04`.

The container entrypoint (`container-entrypoint.sh`) handles:

Trapping SIGTERM/SIGKILL for graceful shutdown and model upload
Optional S3 model sync before server launch
Launching `lorax-launcher` which orchestrates the router and server shards

Usage

Use this environment for production deployment and local development of LoRAX. The Docker image is the recommended deployment method as it includes all pre-compiled CUDA kernels that are difficult to build manually. Alternative deployment targets include Kubernetes (via Helm chart) and SageMaker (via `sagemaker-entrypoint.sh`).

System Requirements

Category	Requirement	Notes
Docker	Docker 20.10+	With BuildKit support
NVIDIA Container Toolkit	nvidia-docker2 or nvidia-container-toolkit	Required for GPU passthrough
NVIDIA Driver	550+	Compatible with CUDA 12.4 runtime
Disk	20GB+	Docker image is ~15GB with all CUDA kernels

Dependencies

Base Images

`lukemathwalker/cargo-chef:latest-rust-1.83` (Rust build stage)
`nvidia/cuda:12.4.0-devel-ubuntu22.04` (Python/kernel build stage)
`nvidia/cuda:12.4.0-base-ubuntu22.04` (Runtime stage)

Build Tools (compile stage only)

Rust 1.83 toolchain
`protoc` (Protocol Buffer compiler)
`ninja-build`, `cmake` >= 3.30.0
Python 3.10 build environment

Runtime Tools

`aws-cli` (for S3 model sync)
`lorax-launcher` (Rust binary, process orchestrator)
`lorax-router` (Rust binary, HTTP/gRPC router)
`lorax-server` (Python gRPC server)

Credentials

The following environment variables must be set at container runtime:

`HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`: HuggingFace API token for downloading gated models
`PREDIBASE_API_TOKEN`: Predibase platform API token (if using Predibase-hosted adapters)
`AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`: For S3 model source (if loading from S3)
`PREDIBASE_MODEL_BUCKET`: S3 bucket for Predibase model storage

Quick Install

# Pull the official image
docker pull ghcr.io/predibase/lorax:latest

# Run with GPU access
docker run --gpus all -p 8080:80 \
  -e MODEL_ID=meta-llama/Llama-2-7b-hf \
  -e HF_TOKEN=$HF_TOKEN \
  ghcr.io/predibase/lorax:latest

# Build from source
DOCKER_BUILDKIT=1 docker build -t lorax:custom .

Code Evidence

Container entrypoint signal handling from `container-entrypoint.sh:3-7`:

#!/bin/bash
# Trap SIGTERM and SIGKILL for model upload
trap upload SIGTERM SIGKILL

function upload() {
    # Handle graceful shutdown with model state preservation
}

Launcher token injection from `launcher/src/main.rs:907`:

// Set HF token as env var for server shards
envs.push(("HUGGING_FACE_HUB_TOKEN".into(), api_token.into()));

CUDA architecture targets in Dockerfile:

# ExLLaMA kernels: Ampere+
ENV TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX"

# vLLM kernels: Turing through Hopper
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"

Common Errors

Error Message	Cause	Solution
`nvidia-container-cli: initialization error`	NVIDIA Container Toolkit not installed	Install nvidia-container-toolkit package
`CUDA error: no kernel image is available for execution on the device`	GPU architecture not in TORCH_CUDA_ARCH_LIST	Rebuild Docker image with your GPU arch or use compatible GPU
`lorax-launcher: command not found`	Binary not in PATH	Ensure container was built correctly; check /usr/local/bin/
S3 sync timeout	Network issue or missing AWS credentials	Check AWS_ACCESS_KEY_ID and network connectivity

Compatibility Notes

GPU Architecture: Pre-built image supports SM 7.0 through SM 9.0+ (Volta through Hopper). Older GPUs (Pascal, SM 6.x) are not supported.
ARM64: Not officially supported. Dockerfile targets x86_64.
Kubernetes: Helm chart available at `charts/lorax/` for K8s deployment.
SageMaker: Separate entrypoint at `sagemaker-entrypoint.sh` for AWS SageMaker deployment.
Image Size: ~15GB due to CUDA toolkit and pre-compiled kernels.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment