Environment:Neuml Txtai Docker Deployment Environment

Knowledge Sources	txtai txtai Cloud Deployment
Domains	Infrastructure, Deployment, Containers
Last Updated	2026-02-09 17:00 GMT

Overview

Docker container environment based on `python:3.10-slim` with configurable GPU support, system audio libraries, and NLTK data for production txtai deployments.

Description

This environment defines the official Docker container for deploying txtai as a containerized service. It uses a slim Python base image with essential system packages (libgomp1 for OpenMP, libportaudio2 and libsndfile1 for audio pipelines). The container supports configurable GPU acceleration via a build argument and installs txtai with selectable component groups. NLTK data (punkt tokenizer and POS tagger) is pre-downloaded. An AWS Lambda variant enables serverless deployment.

Usage

Use this environment for production deployments of txtai as a REST API service, workflow scheduler, or serverless function. It is the recommended deployment method for the `Workflow_Schedule_And_Application` implementation and the API module. The Docker image supports both CPU and GPU modes.

System Requirements

Category	Requirement	Notes
Container Runtime	Docker 20.10+	Docker Desktop or Docker Engine
OS (Host)	Linux (recommended)	macOS/Windows via Docker Desktop
GPU (Optional)	NVIDIA Container Toolkit	Required for GPU passthrough
Disk	5-10GB	For base image plus models
Architecture	amd64 or arm64	CPU-only PyTorch for arm64

Dependencies

System Packages (APT)

`libgomp1` — OpenMP runtime (required by Faiss)
`libportaudio2` — PortAudio library (audio pipelines)
`libsndfile1` — Sound file I/O library (audio pipelines)
`gcc`, `g++` — Build tools (removed after install)
`git` — For installing from Git repositories (removed after install)

NLTK Data (Auto-Downloaded)

`punkt` — Sentence tokenizer
`punkt_tab` — Updated sentence tokenizer tables
`averaged_perceptron_tagger_eng` — English POS tagger

Environment Variables

`LC_ALL=C.UTF-8` — Locale setting for UTF-8 support
`LANG=C.UTF-8` — Language locale

Credentials

No credentials are baked into the Docker image. At runtime, mount or inject:

`HF_TOKEN`: For gated Hugging Face model access
`TOKEN`: API authorization token (SHA-256 hashed) for txtai API security
`ACCESS_KEY` / `ACCESS_SECRET`: For cloud storage integration (Apache libcloud)
`SCORING_URL`: PostgreSQL connection URL for PGText scoring backend
`CLIENT_URL`: Database connection URL for external RDBMS storage

Quick Install

# Build CPU image (all components)
docker build -t txtai -f docker/base/Dockerfile .

# Build GPU image
docker build -t txtai-gpu --build-arg GPU=1 -f docker/base/Dockerfile .

# Build with specific components
docker build -t txtai-api --build-arg COMPONENTS="[api,pipeline]" -f docker/base/Dockerfile .

# Run the container
docker run -p 8000:8000 -v /path/to/config:/app txtai

# AWS Lambda deployment
docker build -t txtai-lambda -f docker/aws/Dockerfile .

Code Evidence

Dockerfile build arguments from `docker/base/Dockerfile:1-15`:

ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE

# Install GPU-enabled version of PyTorch if set
ARG GPU

# Target CPU architecture
ARG TARGETARCH

# List of txtai components to install
ARG COMPONENTS=[all]

CPU-only PyTorch installation logic from `docker/base/Dockerfile:30`:

if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;};
then pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
  -f https://download.pytorch.org/whl/torch;
fi

AWS Lambda handler from `docker/aws/api.py:7-17`:

from mangum import Mangum
from txtai.api import app
handler = Mangum(app, lifespan="off")

Common Errors

Error Message	Cause	Solution
`libgomp.so.1: cannot open shared object file`	Missing OpenMP library	Ensure `libgomp1` is installed in image
NLTK `punkt` not found	NLTK data not downloaded	Run `nltk.download('punkt')` or rebuild image
GPU not detected inside container	Missing NVIDIA Container Toolkit	Install `nvidia-container-toolkit` and use `--gpus all`
`PortAudio library not found`	Missing audio library	Ensure `libportaudio2` is in the image

Compatibility Notes

ARM64 (Apple Silicon, Graviton): CPU-only PyTorch is installed automatically; GPU passthrough not available
AWS Lambda: Uses Mangum adapter; API runs in serverless mode with lifespan disabled
Build cleanup: gcc, g++, and git are purged after installation to reduce image size
NLTK data: Only downloaded if nltk is installed (conditional check in Dockerfile)

Related Pages

Implementation:Neuml_Txtai_Workflow_Schedule_And_Application

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment