Workflow:Neuml Txtai API Deployment

Knowledge Sources	txtai txtai API Guide txtai API Configuration txtai Cloud Deployment
Domains	API_Development, Deployment, Microservices
Last Updated	2026-02-10 00:00 GMT

Overview

End-to-end process for deploying txtai capabilities as a REST API service using YAML configuration and FastAPI.

Description

This workflow covers the deployment of txtai as a production-ready HTTP API service. The entire application (embeddings indexes, pipelines, workflows, agents) is configured declaratively via a YAML file and served through FastAPI. The API layer automatically exposes REST endpoints for all configured components, supports token-based authorization, OpenAI-compatible chat endpoints, Model Context Protocol (MCP) integration, distributed clustering, and custom extensions. The service can be deployed as a standalone process, Docker container, or serverless function.

Usage

Execute this workflow when you need to expose txtai functionality over HTTP for integration with other applications, web frontends, or microservice architectures. This is the standard approach for production deployments where multiple clients need to access shared embeddings indexes, pipelines, or agent capabilities.

Execution Steps

Step 1: Write the YAML Configuration

Create a YAML configuration file that declares all components to deploy. The configuration defines embeddings indexes, pipelines, workflows, and agents. Each top-level key corresponds to a component type, and the API automatically generates REST endpoints for each configured component.

Configuration sections:

embeddings: vector index configuration (model path, content storage, ANN backend)
pipelines: named pipeline instances (summary, translation, LLM, etc.)
workflow: named workflow definitions with task chains
agent: agent configurations with tools and LLM settings
writable: enables index modification endpoints (index, upsert, delete)

Step 2: Configure Security and Extensions

Set up API authentication, custom dependencies, and extensions. Token-based authorization is enabled via the TOKEN environment variable. Custom FastAPI dependencies and extensions can be loaded dynamically from Python class paths.

Security options:

TOKEN environment variable for bearer token authentication
Custom dependency injection via DEPENDENCIES environment variable
Custom API extensions via EXTENSIONS environment variable
CORS and middleware configuration through FastAPI standard mechanisms

Step 3: Start the API Server

Launch the API using a WSGI/ASGI server (uvicorn). The CONFIG environment variable points to the YAML configuration file. On startup, the FastAPI lifespan handler reads the config, instantiates the Application, and conditionally registers API routers based on which components are configured.

Startup process:

YAML configuration is parsed via Application.read()
An API instance is created, initializing all configured components
Routers are conditionally included based on configuration keys
OpenAI-compatible endpoints are added if LLM/RAG is configured
MCP service is mounted if mcp flag is set

Step 4: Access API Endpoints

Use the generated REST API endpoints to interact with txtai services. Endpoints follow RESTful conventions and support both JSON and MessagePack response formats. The embeddings endpoints support search, index, upsert, delete, and count operations. Pipeline endpoints accept input data and return processed results.

Endpoint categories:

/search, /batchsearch: semantic search queries
/add, /index, /upsert, /delete: index modification (when writable)
/pipeline-name: pipeline execution endpoints
/workflow: workflow execution
/agent: agent task execution
/v1/chat/completions: OpenAI-compatible chat endpoint

Step 5: Deploy to Production

Package the API for production deployment. Options include Docker containers, cloud services (AWS Lambda, Google Cloud Run, Azure), and Kubernetes. txtai provides Docker configurations for common deployment targets and supports distributed clustering for horizontal scaling.

Deployment options:

Docker container with uvicorn
AWS Lambda with Mangum adapter
Distributed clustering for sharded indexes across multiple nodes
Hugging Face Spaces for demo deployments

Execution Diagram

GitHub URL

Workflow Repository