Environment:Ucbepic Docetl Docker Deployment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deployment |
| Last Updated | 2026-02-08 01:00 GMT |
Overview
Multi-stage Docker deployment with Python 3.11, Node.js 20, FastAPI backend on port 8000, and Next.js frontend on port 3000.
Description
The DocETL Docker image is a multi-stage build combining:
- Stage 1 (Python Builder): `python:3.11-slim` with `uv` package manager, installs all Python extras
- Stage 2 (Node.js Builder): `node:20-alpine`, builds the Next.js frontend
- Stage 3 (Runtime): `python:3.11-slim` with Node.js 20, combines both builds
The `docker-compose.yml` defines two services:
- docetl: Main service with health checks, persistent volume, and configurable ports
- docetl-aws: Optional AWS Bedrock profile with mounted AWS credentials
Usage
Use this environment for self-hosted Docker deployment of the full DocETL stack (backend API + DocWrangler frontend). It is the prerequisite for the Docker Compose launch implementation.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Any Docker-supported OS | Linux recommended for production |
| Hardware | 2+ CPU cores, 4GB+ RAM | For concurrent LLM API calls |
| Docker | Docker Engine 20+ | With Docker Compose v2 |
| Disk | 5GB+ | For Docker image and `/docetl-data` volume |
| Network | Internet access | For LLM API calls and npm package resolution |
Dependencies
Container Base Images
- `python:3.11-slim` (builder and runtime)
- `node:20-alpine` (frontend builder)
System Packages (Runtime)
- `curl` (health checks)
- `libgl1` (for OpenCV/image processing in parsing)
- `libglib2.0-0` (GLib runtime)
- Node.js 20.x (via NodeSource)
Exposed Ports
- 3000 (Frontend, Next.js)
- 8000 (Backend, FastAPI/uvicorn)
Credentials
The following environment variables must be passed to the container:
- `OPENAI_API_KEY`: Required for LLM operations
- `BACKEND_ALLOW_ORIGINS`: CORS origins (default: `http://localhost:3000`)
- `BACKEND_HOST`: Backend bind address (default: `0.0.0.0` in Docker)
- `BACKEND_PORT`: Backend port (default: `8000`)
- `BACKEND_RELOAD`: Hot reload (default: `False`)
- `FRONTEND_HOST`: Frontend bind address (default: `0.0.0.0`)
- `FRONTEND_PORT`: Frontend port (default: `3000`)
- `FRONTEND_DOCKER_COMPOSE_PORT`: Host-mapped frontend port (default: `3031`)
- `BACKEND_DOCKER_COMPOSE_PORT`: Host-mapped backend port (default: `8081`)
- `TEXT_FILE_ENCODINGS`: Text file encoding list (default: `utf-8,latin1,cp1252,iso-8859-1`)
Optional for AWS Bedrock:
- `AWS_PROFILE`: AWS profile name (default: `default`)
- `AWS_REGION`: AWS region (default: `us-west-2`)
Quick Install
# Clone the repository
git clone https://github.com/ucbepic/docetl.git
cd docetl
# Copy and configure environment
cp .env.example .env
# Edit .env to add your OPENAI_API_KEY
# Launch with Docker Compose
docker compose up -d
# With AWS Bedrock support
docker compose --profile aws up -d
Code Evidence
Multi-stage Dockerfile from `Dockerfile:1-8`:
FROM python:3.11-slim AS python-builder
ENV DOCETL_HOME_DIR=/docetl-data
RUN apt-get update && apt-get install -y --no-install-recommends curl
Docker health check from `docker-compose.yml:42-47`:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Persistent volume from `docker-compose.yml:50-53`:
volumes:
docetl-data:
name: docetl-data
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `health check failed` | Backend not ready within 40s start period | Increase `start_period` or check logs |
| `port already in use` | Host ports 3031/8081 occupied | Change `FRONTEND_DOCKER_COMPOSE_PORT` or `BACKEND_DOCKER_COMPOSE_PORT` |
| `OPENAI_API_KEY not set` | Missing API key in environment | Add `OPENAI_API_KEY` to `.env` file |
| `disk space exhausted` | Docker volume full | Increase disk or clean cache in `/docetl-data` |
Compatibility Notes
- Volume Persistence: The `docetl-data` named volume persists LLM response cache and uploaded files across container restarts.
- AWS Profile: The `docetl-aws` service mounts `~/.aws:/root/.aws:ro` for Bedrock credentials. Uses Docker Compose profile `aws`.
- CORS: By default, only `http://localhost:3000` is allowed. For custom domains, set `BACKEND_ALLOW_ORIGINS`.
- Build Time: The multi-stage build installs all Python extras (`uv sync --all-extras`), which includes parsing, server, and retrieval dependencies.