Environment:Predibase Lorax Model Source Credentials

Knowledge Sources	Predibase LoRAX HuggingFace Hub
Domains	Infrastructure, Authentication
Last Updated	2026-02-08 02:30 GMT

Overview

Authentication credentials and environment variables required for model weight downloading from HuggingFace Hub, S3, and Predibase model sources.

Description

LoRAX supports loading model weights and LoRA adapters from multiple sources: HuggingFace Hub, Amazon S3, and local filesystem. Each source requires specific authentication credentials and configuration environment variables. The model source factory in `server/lorax_server/utils/sources/__init__.py` dispatches to the appropriate source handler based on the model ID format (e.g., `s3://bucket/model` for S3, `org/model` for HuggingFace).

The credential system supports per-request tokens (passed via API) and global fallback tokens (set via environment variables), allowing multi-tenant deployments where different users may have access to different gated models.

Usage

This environment is required whenever LoRAX needs to download model weights or adapters from remote sources. It applies to:

Initial model loading at server startup
Dynamic LoRA adapter loading during inference
Adapter prefetching and caching

System Requirements

Category	Requirement	Notes
Network	Internet access	For downloading from HuggingFace Hub and S3
Disk	Sufficient for model cache	Models cached under `HUGGINGFACE_HUB_CACHE` (default: `~/.cache/huggingface/hub`)

Dependencies

Python Packages

`huggingface-hub` >= 0.12 (HuggingFace Hub client)
`boto3` (AWS S3 client)

Credentials

WARNING: Never store actual secret values in wiki pages. Only document variable names and their purpose.

HuggingFace Hub

`HF_TOKEN`: HuggingFace API token passed at container startup
`HUGGING_FACE_HUB_TOKEN`: Global HuggingFace token set as env var in server shards
`LORAX_USE_GLOBAL_HF_TOKEN`: Set to `1` to fall back to global HF token when per-request token is empty
`HF_HUB_OFFLINE`: Set to `true`/`1`/`yes` to disable remote downloads (use cached models only)
`WEIGHTS_CACHE_OVERRIDE`: Override default cache directory for model weights

Amazon S3

`AWS_ACCESS_KEY_ID`: AWS access key for S3 model source
`AWS_SECRET_ACCESS_KEY`: AWS secret key for S3 model source
`S3_ENDPOINT_URL`: Custom S3-compatible endpoint URL (for MinIO, R2, etc.)
`R2_ACCOUNT_ID`: Cloudflare R2 account ID (auto-constructs endpoint URL)
`PREDIBASE_MODEL_BUCKET`: Default S3 bucket for Predibase model storage
`PREDIBASE_MODEL_FOLDER`: Folder prefix within the model bucket

Predibase Platform

`PREDIBASE_API_TOKEN`: API token for Predibase-hosted adapters

Quick Install

# Set HuggingFace token for gated models
export HF_TOKEN="hf_your_token_here"

# Set S3 credentials for S3 model source
export AWS_ACCESS_KEY_ID="your_access_key"
export AWS_SECRET_ACCESS_KEY="your_secret_key"
export PREDIBASE_MODEL_BUCKET="your-bucket"

# Enable global HF token fallback
export LORAX_USE_GLOBAL_HF_TOKEN=1

# Run LoRAX with credentials
docker run --gpus all -p 8080:80 \
  -e HF_TOKEN=$HF_TOKEN \
  -e MODEL_ID=meta-llama/Llama-2-7b-hf \
  ghcr.io/predibase/lorax:latest

Code Evidence

HuggingFace Hub global token fallback from `server/lorax_server/utils/sources/hub.py:276-280`:

def get_hub_api(token: Optional[str] = None) -> HfApi:
    if token == "" and bool(int(os.environ.get("LORAX_USE_GLOBAL_HF_TOKEN", "0"))):
        # User initialized LoRAX to fallback to global HF token if request token is empty
        token = os.environ.get("HUGGING_FACE_HUB_TOKEN")
    return HfApi(token=token)

S3 bucket resolution from `server/lorax_server/utils/sources/s3.py:26-40`:

def _get_bucket_and_model_id(model_id: str) -> Tuple[str, str]:
    if model_id.startswith(S3_PREFIX):
        model_id_no_protocol = model_id[len(S3_PREFIX):]
        bucket_name, model_id = model_id_no_protocol.split("/", 1)
        return bucket_name, model_id

    bucket = os.getenv("PREDIBASE_MODEL_BUCKET")
    folder = os.getenv("PREDIBASE_MODEL_FOLDER", "")

Offline mode detection from `server/lorax_server/utils/sources/hub.py:18`:

HF_HUB_OFFLINE = os.environ.get("HF_HUB_OFFLINE", "0").lower() in ["true", "1", "yes"]

Common Errors

Error Message	Cause	Solution
`401 Client Error: Unauthorized`	Invalid or missing HuggingFace token	Set valid `HF_TOKEN` with read access to the model repo
`EntryNotFoundError`	Model file not found on HuggingFace Hub	Verify model ID and revision; check if model is gated
`botocore.exceptions.ClientError: AccessDenied`	Invalid AWS credentials or bucket permissions	Verify `AWS_ACCESS_KEY_ID` and bucket access policies
`LocalEntryNotFoundError`	Model not in cache and HF_HUB_OFFLINE=true	Download model first or set HF_HUB_OFFLINE=0

Compatibility Notes

Multi-tenant: Per-request tokens take priority over global tokens. Set `LORAX_USE_GLOBAL_HF_TOKEN=1` to enable fallback.
Cloudflare R2: Set `R2_ACCOUNT_ID` to auto-construct S3-compatible endpoint URL.
Offline mode: Set `HF_HUB_OFFLINE=true` for air-gapped environments. All models must be pre-cached.
Custom S3 endpoints: Set `S3_ENDPOINT_URL` for MinIO, Ceph, or other S3-compatible stores.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment