Implementation:Neuml Txtai Production Deployment Tools

Overview

This page documents the tools and components used to deploy txtai APIs to production environments. It covers the Cluster class for distributed search sharding, the Mangum adapter for AWS Lambda deployment, and the Docker configurations for containerized deployments. These tools transform a YAML-configured txtai application into a production-ready service.

Cluster

API Signature

from txtai.api.cluster import Cluster

cluster = Cluster(config)

Parameter	Type	Default	Description
`config`	`dict`	`None`	Cluster configuration containing a `shards` key with a list of shard URLs

Source Reference

File: src/python/txtai/api/cluster.py (Lines 16-39)

Constructor Implementation

class Cluster:
    """
    Aggregates multiple embeddings shards into a single logical embeddings instance.
    """

    def __init__(self, config=None):
        # Configuration
        self.config = config

        # Embeddings shard urls
        self.shards = None
        if "shards" in self.config:
            self.shards = self.config["shards"]

        # Query aggregator
        self.aggregate = Aggregate()

Instance attributes:

Attribute	Type	Description
`self.config`	`dict`	Raw cluster configuration
`self.shards`	`list[str]` or `None`	List of shard base URLs (e.g., `["http://shard-0:8000", "http://shard-1:8000"]`)
`self.aggregate`	`Aggregate`	SQL aggregate function handler for combining results across shards

Key Methods

search(query, limit, ...)

Fans out a search query to all shards via async HTTP GET, then aggregates and sorts results.

def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):
    action = f"search?query={urllib.parse.quote_plus(query)}"
    # ... append optional parameters to URL ...

    results = []
    for result in self.execute("get", action):
        results.extend(result)

    results = self.aggregate(query, results)
    return results[:(limit if limit else 10)]

add(documents)

Distributes documents across shards using consistent hashing.

def add(self, documents):
    self.execute("post", "add", self.shard(documents))

shard(documents)

Splits documents into shard-specific buckets using Adler-32 hashing on document IDs.

def shard(self, documents):
    shards = [[] for _ in range(len(self.shards))]
    for document in documents:
        uid = document.get("id") if isinstance(document, dict) else document
        if uid and isinstance(uid, str):
            uid = zlib.adler32(uid.encode("utf-8"))
        elif uid is None:
            uid = random.randint(0, len(shards) - 1)
        shards[uid % len(self.shards)].append(document)
    return shards

execute(method, action, data)

Runs async HTTP requests against all shards using aiohttp and asyncio.

def execute(self, method, action, data=None):
    urls = [f"{shard}/{action}" for shard in self.shards]
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(self.run(urls, method, data))

Cluster Configuration Example

# coordinator config.yml
cluster:
  shards:
    - http://shard-0:8000
    - http://shard-1:8000
    - http://shard-2:8000

# Start the coordinator
CONFIG=coordinator.yml uvicorn "txtai.api:app" --host 0.0.0.0 --port 8000

AWS Lambda Handler (Mangum)

Source Reference

File: docker/aws/api.py (Lines 1-17)

Complete Implementation

"""
Lambda handler for a txtai API instance
"""

from mangum import Mangum

from txtai.api import app, start

# Create FastAPI application instance wrapped by Mangum
handler = None
if not handler:
    # Start application
    start()

    # Create handler
    handler = Mangum(app, lifespan="off")

Key details:

Component	Purpose
`start()`	Manually triggers the FastAPI lifespan startup (model loading, route registration)
`Mangum(app, lifespan="off")`	Wraps the FastAPI app for Lambda; `lifespan="off"` because startup was handled manually
`handler`	The Lambda entry point; AWS invokes `handler(event, context)`

The lifespan="off" parameter is critical: since start() already ran the lifespan handler, Mangum must not attempt to run it again. The guard if not handler ensures initialization happens only once per Lambda container (Lambda reuses containers for warm starts).

Lambda Dockerfile

File: docker/aws/Dockerfile

ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE

ARG APP=api.py

# Install Lambda Runtime Interface Client and Mangum ASGI bindings
RUN pip install awslambdaric mangum

# Copy configuration
COPY config.yml .

# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"

# Copy application
COPY $APP ./app.py

# Start runtime client using default application handler
ENV CONFIG "config.yml"
ENTRYPOINT ["python", "-m", "awslambdaric"]
CMD ["app.handler"]

Build steps:

Extends the txtai base image
Installs awslambdaric (AWS Lambda Runtime Interface Client) and mangum
Copies the YAML configuration
Pre-caches models by instantiating API with loaddata=False
Copies the Lambda handler script
Sets the entrypoint to the AWS Lambda runtime client

Building and Deploying to Lambda

# Build the Lambda container
docker build -t txtai-lambda -f docker/aws/Dockerfile .

# Tag and push to ECR
docker tag txtai-lambda:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest

# Create Lambda function from container image
aws lambda create-function \
    --function-name txtai-api \
    --package-type Image \
    --code ImageUri=123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest \
    --role arn:aws:iam::123456789:role/lambda-role \
    --memory-size 4096 \
    --timeout 300

Docker API Deployment

Source Reference

File: docker/api/Dockerfile

Complete Dockerfile

ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE

# Copy configuration
COPY config.yml .

# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"

# Start server and listen on all interfaces
ENV CONFIG "config.yml"
ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]

Build steps:

Extends the txtai base image (CPU or GPU variant)
Copies the YAML configuration
Pre-caches all models during image build
Sets uvicorn as the entrypoint, binding to all network interfaces

Building and Running

# Build the Docker image
docker build -t txtai-api -f docker/api/Dockerfile .

# Run the container
docker run -p 8000:8000 txtai-api

# Run with GPU support
docker run --gpus all -p 8000:8000 txtai-api

# Override configuration at runtime
docker run -p 8000:8000 -e CONFIG=/data/custom.yml -v /host/data:/data txtai-api

Docker Base Image

Source Reference

File: docker/base/Dockerfile

The base image installs all system dependencies and the txtai library:

ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE

ARG GPU
ARG TARGETARCH
ARG PYTHON_VERSION=3
ARG COMPONENTS=[all]

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN \
    apt-get update && \
    apt-get -y --no-install-recommends install libgomp1 libportaudio2 libsndfile1 git gcc g++ \
        python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip && \
    rm -rf /var/lib/apt/lists && \
    ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python && \
    python -m pip install --no-cache-dir -U pip wheel setuptools && \
    if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;}; then \
        pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
        -f https://download.pytorch.org/whl/torch -f https://download.pytorch.org/whl/torchvision; \
    fi && \
    python -m pip install --no-cache-dir txtai${COMPONENTS} && \
    apt-get -y purge git gcc g++ python${PYTHON_VERSION}-dev && apt-get -y autoremove

WORKDIR /app

Build arguments:

Argument	Default	Description
`BASE_IMAGE`	`python:3.10-slim`	Base Python image
`GPU`	empty	Set to enable GPU PyTorch build
`TARGETARCH`	auto-detected	Target CPU architecture
`PYTHON_VERSION`	`3`	Python version to install
`COMPONENTS`	`[all]`	txtai components to install (e.g., `[api,pipeline]`)

Deployment Comparison

Aspect	Docker (uvicorn)	AWS Lambda (Mangum)	Cluster
Entry point	`uvicorn txtai.api:app`	`awslambdaric app.handler`	uvicorn on coordinator + shards
Model caching	Docker build layer	Docker build layer	Per-shard Docker build
Scaling	Container orchestrator	AWS auto-scaling	Manual shard provisioning
State persistence	Local filesystem	External storage (S3)	Per-shard filesystem
Startup time	Seconds (models cached)	Seconds to minutes (cold start)	Seconds per shard

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Implementation:Neuml Txtai Production Deployment Tools

Overview

Cluster

API Signature

Source Reference

Constructor Implementation

Key Methods

search(query, limit, ...)

add(documents)

shard(documents)

execute(method, action, data)

Cluster Configuration Example

AWS Lambda Handler (Mangum)

Source Reference

Complete Implementation

Lambda Dockerfile

Building and Deploying to Lambda

Docker API Deployment

Source Reference

Complete Dockerfile

Building and Running

Docker Base Image

Source Reference

Deployment Comparison

See Also

Requires Environment

Page Connections