Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Neuml Txtai Docker Deployment Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deployment, Containers
Last Updated 2026-02-09 17:00 GMT

Overview

Docker container environment based on `python:3.10-slim` with configurable GPU support, system audio libraries, and NLTK data for production txtai deployments.

Description

This environment defines the official Docker container for deploying txtai as a containerized service. It uses a slim Python base image with essential system packages (libgomp1 for OpenMP, libportaudio2 and libsndfile1 for audio pipelines). The container supports configurable GPU acceleration via a build argument and installs txtai with selectable component groups. NLTK data (punkt tokenizer and POS tagger) is pre-downloaded. An AWS Lambda variant enables serverless deployment.

Usage

Use this environment for production deployments of txtai as a REST API service, workflow scheduler, or serverless function. It is the recommended deployment method for the `Workflow_Schedule_And_Application` implementation and the API module. The Docker image supports both CPU and GPU modes.

System Requirements

Category Requirement Notes
Container Runtime Docker 20.10+ Docker Desktop or Docker Engine
OS (Host) Linux (recommended) macOS/Windows via Docker Desktop
GPU (Optional) NVIDIA Container Toolkit Required for GPU passthrough
Disk 5-10GB For base image plus models
Architecture amd64 or arm64 CPU-only PyTorch for arm64

Dependencies

System Packages (APT)

  • `libgomp1` — OpenMP runtime (required by Faiss)
  • `libportaudio2` — PortAudio library (audio pipelines)
  • `libsndfile1` — Sound file I/O library (audio pipelines)
  • `gcc`, `g++` — Build tools (removed after install)
  • `git` — For installing from Git repositories (removed after install)

NLTK Data (Auto-Downloaded)

  • `punkt` — Sentence tokenizer
  • `punkt_tab` — Updated sentence tokenizer tables
  • `averaged_perceptron_tagger_eng` — English POS tagger

Environment Variables

  • `LC_ALL=C.UTF-8` — Locale setting for UTF-8 support
  • `LANG=C.UTF-8` — Language locale

Credentials

No credentials are baked into the Docker image. At runtime, mount or inject:

  • `HF_TOKEN`: For gated Hugging Face model access
  • `TOKEN`: API authorization token (SHA-256 hashed) for txtai API security
  • `ACCESS_KEY` / `ACCESS_SECRET`: For cloud storage integration (Apache libcloud)
  • `SCORING_URL`: PostgreSQL connection URL for PGText scoring backend
  • `CLIENT_URL`: Database connection URL for external RDBMS storage

Quick Install

# Build CPU image (all components)
docker build -t txtai -f docker/base/Dockerfile .

# Build GPU image
docker build -t txtai-gpu --build-arg GPU=1 -f docker/base/Dockerfile .

# Build with specific components
docker build -t txtai-api --build-arg COMPONENTS="[api,pipeline]" -f docker/base/Dockerfile .

# Run the container
docker run -p 8000:8000 -v /path/to/config:/app txtai

# AWS Lambda deployment
docker build -t txtai-lambda -f docker/aws/Dockerfile .

Code Evidence

Dockerfile build arguments from `docker/base/Dockerfile:1-15`:

ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE

# Install GPU-enabled version of PyTorch if set
ARG GPU

# Target CPU architecture
ARG TARGETARCH

# List of txtai components to install
ARG COMPONENTS=[all]

CPU-only PyTorch installation logic from `docker/base/Dockerfile:30`:

if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;};
then pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
  -f https://download.pytorch.org/whl/torch;
fi

AWS Lambda handler from `docker/aws/api.py:7-17`:

from mangum import Mangum
from txtai.api import app
handler = Mangum(app, lifespan="off")

Common Errors

Error Message Cause Solution
`libgomp.so.1: cannot open shared object file` Missing OpenMP library Ensure `libgomp1` is installed in image
NLTK `punkt` not found NLTK data not downloaded Run `nltk.download('punkt')` or rebuild image
GPU not detected inside container Missing NVIDIA Container Toolkit Install `nvidia-container-toolkit` and use `--gpus all`
`PortAudio library not found` Missing audio library Ensure `libportaudio2` is in the image

Compatibility Notes

  • ARM64 (Apple Silicon, Graviton): CPU-only PyTorch is installed automatically; GPU passthrough not available
  • AWS Lambda: Uses Mangum adapter; API runs in serverless mode with lifespan disabled
  • Build cleanup: gcc, g++, and git are purged after installation to reduce image size
  • NLTK data: Only downloaded if nltk is installed (conditional check in Dockerfile)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment