Implementation:Neuml Txtai Production Deployment Tools
Overview
This page documents the tools and components used to deploy txtai APIs to production environments. It covers the Cluster class for distributed search sharding, the Mangum adapter for AWS Lambda deployment, and the Docker configurations for containerized deployments. These tools transform a YAML-configured txtai application into a production-ready service.
Cluster
API Signature
from txtai.api.cluster import Cluster
cluster = Cluster(config)
| Parameter | Type | Default | Description |
|---|---|---|---|
config |
dict |
None |
Cluster configuration containing a shards key with a list of shard URLs
|
Source Reference
File: src/python/txtai/api/cluster.py (Lines 16-39)
Constructor Implementation
class Cluster:
"""
Aggregates multiple embeddings shards into a single logical embeddings instance.
"""
def __init__(self, config=None):
# Configuration
self.config = config
# Embeddings shard urls
self.shards = None
if "shards" in self.config:
self.shards = self.config["shards"]
# Query aggregator
self.aggregate = Aggregate()
Instance attributes:
| Attribute | Type | Description |
|---|---|---|
self.config |
dict |
Raw cluster configuration |
self.shards |
list[str] or None |
List of shard base URLs (e.g., ["http://shard-0:8000", "http://shard-1:8000"])
|
self.aggregate |
Aggregate |
SQL aggregate function handler for combining results across shards |
Key Methods
search(query, limit, ...)
Fans out a search query to all shards via async HTTP GET, then aggregates and sorts results.
def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):
action = f"search?query={urllib.parse.quote_plus(query)}"
# ... append optional parameters to URL ...
results = []
for result in self.execute("get", action):
results.extend(result)
results = self.aggregate(query, results)
return results[:(limit if limit else 10)]
add(documents)
Distributes documents across shards using consistent hashing.
def add(self, documents):
self.execute("post", "add", self.shard(documents))
shard(documents)
Splits documents into shard-specific buckets using Adler-32 hashing on document IDs.
def shard(self, documents):
shards = [[] for _ in range(len(self.shards))]
for document in documents:
uid = document.get("id") if isinstance(document, dict) else document
if uid and isinstance(uid, str):
uid = zlib.adler32(uid.encode("utf-8"))
elif uid is None:
uid = random.randint(0, len(shards) - 1)
shards[uid % len(self.shards)].append(document)
return shards
execute(method, action, data)
Runs async HTTP requests against all shards using aiohttp and asyncio.
def execute(self, method, action, data=None):
urls = [f"{shard}/{action}" for shard in self.shards]
loop = asyncio.get_event_loop()
return loop.run_until_complete(self.run(urls, method, data))
Cluster Configuration Example
# coordinator config.yml
cluster:
shards:
- http://shard-0:8000
- http://shard-1:8000
- http://shard-2:8000
# Start the coordinator
CONFIG=coordinator.yml uvicorn "txtai.api:app" --host 0.0.0.0 --port 8000
AWS Lambda Handler (Mangum)
Source Reference
File: docker/aws/api.py (Lines 1-17)
Complete Implementation
"""
Lambda handler for a txtai API instance
"""
from mangum import Mangum
from txtai.api import app, start
# Create FastAPI application instance wrapped by Mangum
handler = None
if not handler:
# Start application
start()
# Create handler
handler = Mangum(app, lifespan="off")
Key details:
| Component | Purpose |
|---|---|
start() |
Manually triggers the FastAPI lifespan startup (model loading, route registration) |
Mangum(app, lifespan="off") |
Wraps the FastAPI app for Lambda; lifespan="off" because startup was handled manually
|
handler |
The Lambda entry point; AWS invokes handler(event, context)
|
The lifespan="off" parameter is critical: since start() already ran the lifespan handler, Mangum must not attempt to run it again. The guard if not handler ensures initialization happens only once per Lambda container (Lambda reuses containers for warm starts).
Lambda Dockerfile
File: docker/aws/Dockerfile
ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE
ARG APP=api.py
# Install Lambda Runtime Interface Client and Mangum ASGI bindings
RUN pip install awslambdaric mangum
# Copy configuration
COPY config.yml .
# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"
# Copy application
COPY $APP ./app.py
# Start runtime client using default application handler
ENV CONFIG "config.yml"
ENTRYPOINT ["python", "-m", "awslambdaric"]
CMD ["app.handler"]
Build steps:
- Extends the txtai base image
- Installs
awslambdaric(AWS Lambda Runtime Interface Client) andmangum - Copies the YAML configuration
- Pre-caches models by instantiating
APIwithloaddata=False - Copies the Lambda handler script
- Sets the entrypoint to the AWS Lambda runtime client
Building and Deploying to Lambda
# Build the Lambda container
docker build -t txtai-lambda -f docker/aws/Dockerfile .
# Tag and push to ECR
docker tag txtai-lambda:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest
# Create Lambda function from container image
aws lambda create-function \
--function-name txtai-api \
--package-type Image \
--code ImageUri=123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest \
--role arn:aws:iam::123456789:role/lambda-role \
--memory-size 4096 \
--timeout 300
Docker API Deployment
Source Reference
File: docker/api/Dockerfile
Complete Dockerfile
ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE
# Copy configuration
COPY config.yml .
# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"
# Start server and listen on all interfaces
ENV CONFIG "config.yml"
ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]
Build steps:
- Extends the txtai base image (CPU or GPU variant)
- Copies the YAML configuration
- Pre-caches all models during image build
- Sets uvicorn as the entrypoint, binding to all network interfaces
Building and Running
# Build the Docker image
docker build -t txtai-api -f docker/api/Dockerfile .
# Run the container
docker run -p 8000:8000 txtai-api
# Run with GPU support
docker run --gpus all -p 8000:8000 txtai-api
# Override configuration at runtime
docker run -p 8000:8000 -e CONFIG=/data/custom.yml -v /host/data:/data txtai-api
Docker Base Image
Source Reference
File: docker/base/Dockerfile
The base image installs all system dependencies and the txtai library:
ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE
ARG GPU
ARG TARGETARCH
ARG PYTHON_VERSION=3
ARG COMPONENTS=[all]
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN \
apt-get update && \
apt-get -y --no-install-recommends install libgomp1 libportaudio2 libsndfile1 git gcc g++ \
python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip && \
rm -rf /var/lib/apt/lists && \
ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python && \
python -m pip install --no-cache-dir -U pip wheel setuptools && \
if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;}; then \
pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
-f https://download.pytorch.org/whl/torch -f https://download.pytorch.org/whl/torchvision; \
fi && \
python -m pip install --no-cache-dir txtai${COMPONENTS} && \
apt-get -y purge git gcc g++ python${PYTHON_VERSION}-dev && apt-get -y autoremove
WORKDIR /app
Build arguments:
| Argument | Default | Description |
|---|---|---|
BASE_IMAGE |
python:3.10-slim |
Base Python image |
GPU |
empty | Set to enable GPU PyTorch build |
TARGETARCH |
auto-detected | Target CPU architecture |
PYTHON_VERSION |
3 |
Python version to install |
COMPONENTS |
[all] |
txtai components to install (e.g., [api,pipeline])
|
Deployment Comparison
| Aspect | Docker (uvicorn) | AWS Lambda (Mangum) | Cluster |
|---|---|---|---|
| Entry point | uvicorn txtai.api:app |
awslambdaric app.handler |
uvicorn on coordinator + shards |
| Model caching | Docker build layer | Docker build layer | Per-shard Docker build |
| Scaling | Container orchestrator | AWS auto-scaling | Manual shard provisioning |
| State persistence | Local filesystem | External storage (S3) | Per-shard filesystem |
| Startup time | Seconds (models cached) | Seconds to minutes (cold start) | Seconds per shard |
See Also
- Neuml_Txtai_Production_Deployment - Principle behind production deployment strategies
- Neuml_Txtai_API_Create - The
create()andstart()functions used by all deployment modes - Neuml_Txtai_API_Server_Bootstrap - The ASGI bootstrap process shared across deployments