Environment:Neuml Txtai Docker Deployment Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deployment, Containers |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Docker container environment based on `python:3.10-slim` with configurable GPU support, system audio libraries, and NLTK data for production txtai deployments.
Description
This environment defines the official Docker container for deploying txtai as a containerized service. It uses a slim Python base image with essential system packages (libgomp1 for OpenMP, libportaudio2 and libsndfile1 for audio pipelines). The container supports configurable GPU acceleration via a build argument and installs txtai with selectable component groups. NLTK data (punkt tokenizer and POS tagger) is pre-downloaded. An AWS Lambda variant enables serverless deployment.
Usage
Use this environment for production deployments of txtai as a REST API service, workflow scheduler, or serverless function. It is the recommended deployment method for the `Workflow_Schedule_And_Application` implementation and the API module. The Docker image supports both CPU and GPU modes.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Container Runtime | Docker 20.10+ | Docker Desktop or Docker Engine |
| OS (Host) | Linux (recommended) | macOS/Windows via Docker Desktop |
| GPU (Optional) | NVIDIA Container Toolkit | Required for GPU passthrough |
| Disk | 5-10GB | For base image plus models |
| Architecture | amd64 or arm64 | CPU-only PyTorch for arm64 |
Dependencies
System Packages (APT)
- `libgomp1` — OpenMP runtime (required by Faiss)
- `libportaudio2` — PortAudio library (audio pipelines)
- `libsndfile1` — Sound file I/O library (audio pipelines)
- `gcc`, `g++` — Build tools (removed after install)
- `git` — For installing from Git repositories (removed after install)
NLTK Data (Auto-Downloaded)
- `punkt` — Sentence tokenizer
- `punkt_tab` — Updated sentence tokenizer tables
- `averaged_perceptron_tagger_eng` — English POS tagger
Environment Variables
- `LC_ALL=C.UTF-8` — Locale setting for UTF-8 support
- `LANG=C.UTF-8` — Language locale
Credentials
No credentials are baked into the Docker image. At runtime, mount or inject:
- `HF_TOKEN`: For gated Hugging Face model access
- `TOKEN`: API authorization token (SHA-256 hashed) for txtai API security
- `ACCESS_KEY` / `ACCESS_SECRET`: For cloud storage integration (Apache libcloud)
- `SCORING_URL`: PostgreSQL connection URL for PGText scoring backend
- `CLIENT_URL`: Database connection URL for external RDBMS storage
Quick Install
# Build CPU image (all components)
docker build -t txtai -f docker/base/Dockerfile .
# Build GPU image
docker build -t txtai-gpu --build-arg GPU=1 -f docker/base/Dockerfile .
# Build with specific components
docker build -t txtai-api --build-arg COMPONENTS="[api,pipeline]" -f docker/base/Dockerfile .
# Run the container
docker run -p 8000:8000 -v /path/to/config:/app txtai
# AWS Lambda deployment
docker build -t txtai-lambda -f docker/aws/Dockerfile .
Code Evidence
Dockerfile build arguments from `docker/base/Dockerfile:1-15`:
ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE
# Install GPU-enabled version of PyTorch if set
ARG GPU
# Target CPU architecture
ARG TARGETARCH
# List of txtai components to install
ARG COMPONENTS=[all]
CPU-only PyTorch installation logic from `docker/base/Dockerfile:30`:
if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;};
then pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
-f https://download.pytorch.org/whl/torch;
fi
AWS Lambda handler from `docker/aws/api.py:7-17`:
from mangum import Mangum
from txtai.api import app
handler = Mangum(app, lifespan="off")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `libgomp.so.1: cannot open shared object file` | Missing OpenMP library | Ensure `libgomp1` is installed in image |
| NLTK `punkt` not found | NLTK data not downloaded | Run `nltk.download('punkt')` or rebuild image |
| GPU not detected inside container | Missing NVIDIA Container Toolkit | Install `nvidia-container-toolkit` and use `--gpus all` |
| `PortAudio library not found` | Missing audio library | Ensure `libportaudio2` is in the image |
Compatibility Notes
- ARM64 (Apple Silicon, Graviton): CPU-only PyTorch is installed automatically; GPU passthrough not available
- AWS Lambda: Uses Mangum adapter; API runs in serverless mode with lifespan disabled
- Build cleanup: gcc, g++, and git are purged after installation to reduce image size
- NLTK data: Only downloaded if nltk is installed (conditional check in Dockerfile)