Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai Production Deployment Tools

From Leeroopedia


Overview

This page documents the tools and components used to deploy txtai APIs to production environments. It covers the Cluster class for distributed search sharding, the Mangum adapter for AWS Lambda deployment, and the Docker configurations for containerized deployments. These tools transform a YAML-configured txtai application into a production-ready service.

Cluster

API Signature

from txtai.api.cluster import Cluster

cluster = Cluster(config)
Parameter Type Default Description
config dict None Cluster configuration containing a shards key with a list of shard URLs

Source Reference

File: src/python/txtai/api/cluster.py (Lines 16-39)

Constructor Implementation

class Cluster:
    """
    Aggregates multiple embeddings shards into a single logical embeddings instance.
    """

    def __init__(self, config=None):
        # Configuration
        self.config = config

        # Embeddings shard urls
        self.shards = None
        if "shards" in self.config:
            self.shards = self.config["shards"]

        # Query aggregator
        self.aggregate = Aggregate()

Instance attributes:

Attribute Type Description
self.config dict Raw cluster configuration
self.shards list[str] or None List of shard base URLs (e.g., ["http://shard-0:8000", "http://shard-1:8000"])
self.aggregate Aggregate SQL aggregate function handler for combining results across shards

Key Methods

search(query, limit, ...)

Fans out a search query to all shards via async HTTP GET, then aggregates and sorts results.

def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False):
    action = f"search?query={urllib.parse.quote_plus(query)}"
    # ... append optional parameters to URL ...

    results = []
    for result in self.execute("get", action):
        results.extend(result)

    results = self.aggregate(query, results)
    return results[:(limit if limit else 10)]

add(documents)

Distributes documents across shards using consistent hashing.

def add(self, documents):
    self.execute("post", "add", self.shard(documents))

shard(documents)

Splits documents into shard-specific buckets using Adler-32 hashing on document IDs.

def shard(self, documents):
    shards = [[] for _ in range(len(self.shards))]
    for document in documents:
        uid = document.get("id") if isinstance(document, dict) else document
        if uid and isinstance(uid, str):
            uid = zlib.adler32(uid.encode("utf-8"))
        elif uid is None:
            uid = random.randint(0, len(shards) - 1)
        shards[uid % len(self.shards)].append(document)
    return shards

execute(method, action, data)

Runs async HTTP requests against all shards using aiohttp and asyncio.

def execute(self, method, action, data=None):
    urls = [f"{shard}/{action}" for shard in self.shards]
    loop = asyncio.get_event_loop()
    return loop.run_until_complete(self.run(urls, method, data))

Cluster Configuration Example

# coordinator config.yml
cluster:
  shards:
    - http://shard-0:8000
    - http://shard-1:8000
    - http://shard-2:8000
# Start the coordinator
CONFIG=coordinator.yml uvicorn "txtai.api:app" --host 0.0.0.0 --port 8000

AWS Lambda Handler (Mangum)

Source Reference

File: docker/aws/api.py (Lines 1-17)

Complete Implementation

"""
Lambda handler for a txtai API instance
"""

from mangum import Mangum

from txtai.api import app, start

# Create FastAPI application instance wrapped by Mangum
handler = None
if not handler:
    # Start application
    start()

    # Create handler
    handler = Mangum(app, lifespan="off")

Key details:

Component Purpose
start() Manually triggers the FastAPI lifespan startup (model loading, route registration)
Mangum(app, lifespan="off") Wraps the FastAPI app for Lambda; lifespan="off" because startup was handled manually
handler The Lambda entry point; AWS invokes handler(event, context)

The lifespan="off" parameter is critical: since start() already ran the lifespan handler, Mangum must not attempt to run it again. The guard if not handler ensures initialization happens only once per Lambda container (Lambda reuses containers for warm starts).

Lambda Dockerfile

File: docker/aws/Dockerfile

ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE

ARG APP=api.py

# Install Lambda Runtime Interface Client and Mangum ASGI bindings
RUN pip install awslambdaric mangum

# Copy configuration
COPY config.yml .

# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"

# Copy application
COPY $APP ./app.py

# Start runtime client using default application handler
ENV CONFIG "config.yml"
ENTRYPOINT ["python", "-m", "awslambdaric"]
CMD ["app.handler"]

Build steps:

  1. Extends the txtai base image
  2. Installs awslambdaric (AWS Lambda Runtime Interface Client) and mangum
  3. Copies the YAML configuration
  4. Pre-caches models by instantiating API with loaddata=False
  5. Copies the Lambda handler script
  6. Sets the entrypoint to the AWS Lambda runtime client

Building and Deploying to Lambda

# Build the Lambda container
docker build -t txtai-lambda -f docker/aws/Dockerfile .

# Tag and push to ECR
docker tag txtai-lambda:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest

# Create Lambda function from container image
aws lambda create-function \
    --function-name txtai-api \
    --package-type Image \
    --code ImageUri=123456789.dkr.ecr.us-east-1.amazonaws.com/txtai-lambda:latest \
    --role arn:aws:iam::123456789:role/lambda-role \
    --memory-size 4096 \
    --timeout 300

Docker API Deployment

Source Reference

File: docker/api/Dockerfile

Complete Dockerfile

ARG BASE_IMAGE=neuml/txtai-cpu
FROM $BASE_IMAGE

# Copy configuration
COPY config.yml .

# Run local API instance to cache models in container
RUN python -c "from txtai.api import API; API('config.yml', False)"

# Start server and listen on all interfaces
ENV CONFIG "config.yml"
ENTRYPOINT ["uvicorn", "--host", "0.0.0.0", "txtai.api:app"]

Build steps:

  1. Extends the txtai base image (CPU or GPU variant)
  2. Copies the YAML configuration
  3. Pre-caches all models during image build
  4. Sets uvicorn as the entrypoint, binding to all network interfaces

Building and Running

# Build the Docker image
docker build -t txtai-api -f docker/api/Dockerfile .

# Run the container
docker run -p 8000:8000 txtai-api

# Run with GPU support
docker run --gpus all -p 8000:8000 txtai-api

# Override configuration at runtime
docker run -p 8000:8000 -e CONFIG=/data/custom.yml -v /host/data:/data txtai-api

Docker Base Image

Source Reference

File: docker/base/Dockerfile

The base image installs all system dependencies and the txtai library:

ARG BASE_IMAGE=python:3.10-slim
FROM $BASE_IMAGE

ARG GPU
ARG TARGETARCH
ARG PYTHON_VERSION=3
ARG COMPONENTS=[all]

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN \
    apt-get update && \
    apt-get -y --no-install-recommends install libgomp1 libportaudio2 libsndfile1 git gcc g++ \
        python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python3-pip && \
    rm -rf /var/lib/apt/lists && \
    ln -s /usr/bin/python${PYTHON_VERSION} /usr/bin/python && \
    python -m pip install --no-cache-dir -U pip wheel setuptools && \
    if [ -z ${GPU} ] && { [ -z ${TARGETARCH} ] || [ ${TARGETARCH} = "amd64" ] ;}; then \
        pip install --no-cache-dir torch==2.10.0+cpu torchvision==0.25.0+cpu \
        -f https://download.pytorch.org/whl/torch -f https://download.pytorch.org/whl/torchvision; \
    fi && \
    python -m pip install --no-cache-dir txtai${COMPONENTS} && \
    apt-get -y purge git gcc g++ python${PYTHON_VERSION}-dev && apt-get -y autoremove

WORKDIR /app

Build arguments:

Argument Default Description
BASE_IMAGE python:3.10-slim Base Python image
GPU empty Set to enable GPU PyTorch build
TARGETARCH auto-detected Target CPU architecture
PYTHON_VERSION 3 Python version to install
COMPONENTS [all] txtai components to install (e.g., [api,pipeline])

Deployment Comparison

Aspect Docker (uvicorn) AWS Lambda (Mangum) Cluster
Entry point uvicorn txtai.api:app awslambdaric app.handler uvicorn on coordinator + shards
Model caching Docker build layer Docker build layer Per-shard Docker build
Scaling Container orchestrator AWS auto-scaling Manual shard provisioning
State persistence Local filesystem External storage (S3) Per-shard filesystem
Startup time Seconds (models cached) Seconds to minutes (cold start) Seconds per shard

See Also

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment