Environment:Predibase Lorax Docker Container Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Containers |
| Last Updated | 2026-02-08 02:30 GMT |
Overview
Docker container runtime with NVIDIA GPU support, based on `nvidia/cuda:12.4.0-base-ubuntu22.04`, providing the complete LoRAX inference server with all pre-compiled CUDA kernels.
Description
The official LoRAX Docker image packages the entire inference stack including the Rust router/launcher, Python gRPC server, all pre-compiled CUDA kernels (ExLLaMA, Punica, vLLM, EETQ), and the Python dependency tree. The multi-stage Dockerfile uses `cargo-chef` for Rust build caching and a CUDA devel image for kernel compilation, producing a slim runtime image based on `nvidia/cuda:12.4.0-base-ubuntu22.04`.
The container entrypoint (`container-entrypoint.sh`) handles:
- Trapping SIGTERM/SIGKILL for graceful shutdown and model upload
- Optional S3 model sync before server launch
- Launching `lorax-launcher` which orchestrates the router and server shards
Usage
Use this environment for production deployment and local development of LoRAX. The Docker image is the recommended deployment method as it includes all pre-compiled CUDA kernels that are difficult to build manually. Alternative deployment targets include Kubernetes (via Helm chart) and SageMaker (via `sagemaker-entrypoint.sh`).
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Docker | Docker 20.10+ | With BuildKit support |
| NVIDIA Container Toolkit | nvidia-docker2 or nvidia-container-toolkit | Required for GPU passthrough |
| NVIDIA Driver | 550+ | Compatible with CUDA 12.4 runtime |
| Disk | 20GB+ | Docker image is ~15GB with all CUDA kernels |
Dependencies
Base Images
- `lukemathwalker/cargo-chef:latest-rust-1.83` (Rust build stage)
- `nvidia/cuda:12.4.0-devel-ubuntu22.04` (Python/kernel build stage)
- `nvidia/cuda:12.4.0-base-ubuntu22.04` (Runtime stage)
Build Tools (compile stage only)
- Rust 1.83 toolchain
- `protoc` (Protocol Buffer compiler)
- `ninja-build`, `cmake` >= 3.30.0
- Python 3.10 build environment
Runtime Tools
- `aws-cli` (for S3 model sync)
- `lorax-launcher` (Rust binary, process orchestrator)
- `lorax-router` (Rust binary, HTTP/gRPC router)
- `lorax-server` (Python gRPC server)
Credentials
The following environment variables must be set at container runtime:
- `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`: HuggingFace API token for downloading gated models
- `PREDIBASE_API_TOKEN`: Predibase platform API token (if using Predibase-hosted adapters)
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`: For S3 model source (if loading from S3)
- `PREDIBASE_MODEL_BUCKET`: S3 bucket for Predibase model storage
Quick Install
# Pull the official image
docker pull ghcr.io/predibase/lorax:latest
# Run with GPU access
docker run --gpus all -p 8080:80 \
-e MODEL_ID=meta-llama/Llama-2-7b-hf \
-e HF_TOKEN=$HF_TOKEN \
ghcr.io/predibase/lorax:latest
# Build from source
DOCKER_BUILDKIT=1 docker build -t lorax:custom .
Code Evidence
Container entrypoint signal handling from `container-entrypoint.sh:3-7`:
#!/bin/bash
# Trap SIGTERM and SIGKILL for model upload
trap upload SIGTERM SIGKILL
function upload() {
# Handle graceful shutdown with model state preservation
}
Launcher token injection from `launcher/src/main.rs:907`:
// Set HF token as env var for server shards
envs.push(("HUGGING_FACE_HUB_TOKEN".into(), api_token.into()));
CUDA architecture targets in Dockerfile:
# ExLLaMA kernels: Ampere+
ENV TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX"
# vLLM kernels: Turing through Hopper
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `nvidia-container-cli: initialization error` | NVIDIA Container Toolkit not installed | Install nvidia-container-toolkit package |
| `CUDA error: no kernel image is available for execution on the device` | GPU architecture not in TORCH_CUDA_ARCH_LIST | Rebuild Docker image with your GPU arch or use compatible GPU |
| `lorax-launcher: command not found` | Binary not in PATH | Ensure container was built correctly; check /usr/local/bin/ |
| S3 sync timeout | Network issue or missing AWS credentials | Check AWS_ACCESS_KEY_ID and network connectivity |
Compatibility Notes
- GPU Architecture: Pre-built image supports SM 7.0 through SM 9.0+ (Volta through Hopper). Older GPUs (Pascal, SM 6.x) are not supported.
- ARM64: Not officially supported. Dockerfile targets x86_64.
- Kubernetes: Helm chart available at `charts/lorax/` for K8s deployment.
- SageMaker: Separate entrypoint at `sagemaker-entrypoint.sh` for AWS SageMaker deployment.
- Image Size: ~15GB due to CUDA toolkit and pre-compiled kernels.