Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Ucbepic Docetl Docker Deployment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deployment
Last Updated 2026-02-08 01:00 GMT

Overview

Multi-stage Docker deployment with Python 3.11, Node.js 20, FastAPI backend on port 8000, and Next.js frontend on port 3000.

Description

The DocETL Docker image is a multi-stage build combining:

  • Stage 1 (Python Builder): `python:3.11-slim` with `uv` package manager, installs all Python extras
  • Stage 2 (Node.js Builder): `node:20-alpine`, builds the Next.js frontend
  • Stage 3 (Runtime): `python:3.11-slim` with Node.js 20, combines both builds

The `docker-compose.yml` defines two services:

  • docetl: Main service with health checks, persistent volume, and configurable ports
  • docetl-aws: Optional AWS Bedrock profile with mounted AWS credentials

Usage

Use this environment for self-hosted Docker deployment of the full DocETL stack (backend API + DocWrangler frontend). It is the prerequisite for the Docker Compose launch implementation.

System Requirements

Category Requirement Notes
OS Any Docker-supported OS Linux recommended for production
Hardware 2+ CPU cores, 4GB+ RAM For concurrent LLM API calls
Docker Docker Engine 20+ With Docker Compose v2
Disk 5GB+ For Docker image and `/docetl-data` volume
Network Internet access For LLM API calls and npm package resolution

Dependencies

Container Base Images

  • `python:3.11-slim` (builder and runtime)
  • `node:20-alpine` (frontend builder)

System Packages (Runtime)

  • `curl` (health checks)
  • `libgl1` (for OpenCV/image processing in parsing)
  • `libglib2.0-0` (GLib runtime)
  • Node.js 20.x (via NodeSource)

Exposed Ports

  • 3000 (Frontend, Next.js)
  • 8000 (Backend, FastAPI/uvicorn)

Credentials

The following environment variables must be passed to the container:

  • `OPENAI_API_KEY`: Required for LLM operations
  • `BACKEND_ALLOW_ORIGINS`: CORS origins (default: `http://localhost:3000`)
  • `BACKEND_HOST`: Backend bind address (default: `0.0.0.0` in Docker)
  • `BACKEND_PORT`: Backend port (default: `8000`)
  • `BACKEND_RELOAD`: Hot reload (default: `False`)
  • `FRONTEND_HOST`: Frontend bind address (default: `0.0.0.0`)
  • `FRONTEND_PORT`: Frontend port (default: `3000`)
  • `FRONTEND_DOCKER_COMPOSE_PORT`: Host-mapped frontend port (default: `3031`)
  • `BACKEND_DOCKER_COMPOSE_PORT`: Host-mapped backend port (default: `8081`)
  • `TEXT_FILE_ENCODINGS`: Text file encoding list (default: `utf-8,latin1,cp1252,iso-8859-1`)

Optional for AWS Bedrock:

  • `AWS_PROFILE`: AWS profile name (default: `default`)
  • `AWS_REGION`: AWS region (default: `us-west-2`)

Quick Install

# Clone the repository
git clone https://github.com/ucbepic/docetl.git
cd docetl

# Copy and configure environment
cp .env.example .env
# Edit .env to add your OPENAI_API_KEY

# Launch with Docker Compose
docker compose up -d

# With AWS Bedrock support
docker compose --profile aws up -d

Code Evidence

Multi-stage Dockerfile from `Dockerfile:1-8`:

FROM python:3.11-slim AS python-builder
ENV DOCETL_HOME_DIR=/docetl-data
RUN apt-get update && apt-get install -y --no-install-recommends curl

Docker health check from `docker-compose.yml:42-47`:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 40s

Persistent volume from `docker-compose.yml:50-53`:

volumes:
  docetl-data:
    name: docetl-data

Common Errors

Error Message Cause Solution
`health check failed` Backend not ready within 40s start period Increase `start_period` or check logs
`port already in use` Host ports 3031/8081 occupied Change `FRONTEND_DOCKER_COMPOSE_PORT` or `BACKEND_DOCKER_COMPOSE_PORT`
`OPENAI_API_KEY not set` Missing API key in environment Add `OPENAI_API_KEY` to `.env` file
`disk space exhausted` Docker volume full Increase disk or clean cache in `/docetl-data`

Compatibility Notes

  • Volume Persistence: The `docetl-data` named volume persists LLM response cache and uploaded files across container restarts.
  • AWS Profile: The `docetl-aws` service mounts `~/.aws:/root/.aws:ro` for Bedrock credentials. Uses Docker Compose profile `aws`.
  • CORS: By default, only `http://localhost:3000` is allowed. For custom domains, set `BACKEND_ALLOW_ORIGINS`.
  • Build Time: The multi-stage build installs all Python extras (`uv sync --all-extras`), which includes parsing, server, and retrieval dependencies.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment