Workflow:Neuml Txtai API Deployment
| Knowledge Sources | |
|---|---|
| Domains | API_Development, Deployment, Microservices |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
End-to-end process for deploying txtai capabilities as a REST API service using YAML configuration and FastAPI.
Description
This workflow covers the deployment of txtai as a production-ready HTTP API service. The entire application (embeddings indexes, pipelines, workflows, agents) is configured declaratively via a YAML file and served through FastAPI. The API layer automatically exposes REST endpoints for all configured components, supports token-based authorization, OpenAI-compatible chat endpoints, Model Context Protocol (MCP) integration, distributed clustering, and custom extensions. The service can be deployed as a standalone process, Docker container, or serverless function.
Usage
Execute this workflow when you need to expose txtai functionality over HTTP for integration with other applications, web frontends, or microservice architectures. This is the standard approach for production deployments where multiple clients need to access shared embeddings indexes, pipelines, or agent capabilities.
Execution Steps
Step 1: Write the YAML Configuration
Create a YAML configuration file that declares all components to deploy. The configuration defines embeddings indexes, pipelines, workflows, and agents. Each top-level key corresponds to a component type, and the API automatically generates REST endpoints for each configured component.
Configuration sections:
- embeddings: vector index configuration (model path, content storage, ANN backend)
- pipelines: named pipeline instances (summary, translation, LLM, etc.)
- workflow: named workflow definitions with task chains
- agent: agent configurations with tools and LLM settings
- writable: enables index modification endpoints (index, upsert, delete)
Step 2: Configure Security and Extensions
Set up API authentication, custom dependencies, and extensions. Token-based authorization is enabled via the TOKEN environment variable. Custom FastAPI dependencies and extensions can be loaded dynamically from Python class paths.
Security options:
- TOKEN environment variable for bearer token authentication
- Custom dependency injection via DEPENDENCIES environment variable
- Custom API extensions via EXTENSIONS environment variable
- CORS and middleware configuration through FastAPI standard mechanisms
Step 3: Start the API Server
Launch the API using a WSGI/ASGI server (uvicorn). The CONFIG environment variable points to the YAML configuration file. On startup, the FastAPI lifespan handler reads the config, instantiates the Application, and conditionally registers API routers based on which components are configured.
Startup process:
- YAML configuration is parsed via Application.read()
- An API instance is created, initializing all configured components
- Routers are conditionally included based on configuration keys
- OpenAI-compatible endpoints are added if LLM/RAG is configured
- MCP service is mounted if mcp flag is set
Step 4: Access API Endpoints
Use the generated REST API endpoints to interact with txtai services. Endpoints follow RESTful conventions and support both JSON and MessagePack response formats. The embeddings endpoints support search, index, upsert, delete, and count operations. Pipeline endpoints accept input data and return processed results.
Endpoint categories:
- /search, /batchsearch: semantic search queries
- /add, /index, /upsert, /delete: index modification (when writable)
- /pipeline-name: pipeline execution endpoints
- /workflow: workflow execution
- /agent: agent task execution
- /v1/chat/completions: OpenAI-compatible chat endpoint
Step 5: Deploy to Production
Package the API for production deployment. Options include Docker containers, cloud services (AWS Lambda, Google Cloud Run, Azure), and Kubernetes. txtai provides Docker configurations for common deployment targets and supports distributed clustering for horizontal scaling.
Deployment options:
- Docker container with uvicorn
- AWS Lambda with Mangum adapter
- Distributed clustering for sharded indexes across multiple nodes
- Hugging Face Spaces for demo deployments