Principle:Neuml Txtai REST API Design

Overview

txtai exposes its AI capabilities through a RESTful HTTP API built on FastAPI. The API follows REST conventions for resource-oriented endpoints while also providing an OpenAI-compatible interface that allows txtai to serve as a drop-in replacement for OpenAI API calls. The endpoint design supports content negotiation, batch operations, and streaming responses.

Theoretical Foundation

REST API Design Principles

txtai's API adheres to standard REST conventions:

HTTP methods indicate intent: GET for read operations, POST for write operations and complex queries
URL paths identify resources: /search, /count, /transform map to embeddings operations
Query parameters for simple filters: single-value search uses GET /search?query=text&limit=10
Request bodies for complex data: batch operations use POST with JSON bodies
HTTP status codes communicate outcomes: 200 for success, 401 for unauthorized, 403 for read-only violations, 422 for validation errors

Resource-Oriented Endpoint Design

The txtai API organizes endpoints around the core abstractions:

Category	Endpoints	Purpose
Search	`GET /search`, `POST /batchsearch`	Query the embeddings index
Indexing	`POST /add`, `GET /index`, `GET /upsert`	Add documents and build indexes
Management	`POST /delete`, `POST /reindex`, `GET /count`	Manage index contents
Vectors	`GET /transform`, `POST /batchtransform`	Generate embeddings vectors
Analysis	`POST /explain`, `POST /batchexplain`	Token importance analysis
OpenAI	`POST /v1/chat/completions`, `POST /v1/embeddings`	OpenAI-compatible interface

Batch Operation Pattern

For operations that benefit from batching, txtai provides paired endpoints:

Single: GET /search?query=text -- simple query with URL parameters
Batch: POST /batchsearch -- multiple queries in a single request body

The batch pattern reduces HTTP overhead when processing multiple items and allows the backend to optimize batch processing (e.g., batched model inference).

Content Negotiation

txtai implements content negotiation through the HTTP Accept header. The EncodingAPIRoute class inspects each request's Accept header and selects an appropriate response encoder:

JSON (default): standard JSON serialization
MessagePack: binary serialization for higher throughput
Custom encodings: extensible through ResponseFactory

This is implemented via a custom APIRoute class that overrides FastAPI's default route handler:

class EncodingAPIRoute(APIRoute):
    def get_route_handler(self):
        async def handler(request):
            route = get_request_handler(
                ...,
                response_class=ResponseFactory.create(request),
                ...
            )
            return await route(request)
        return handler

The response class is determined per request, allowing different clients to receive the same data in their preferred format.

OpenAI API Compatibility

Design Philosophy

txtai provides endpoints that mirror the OpenAI API specification, allowing clients built for OpenAI's API to work with txtai without modification. This compatibility layer supports:

POST /v1/chat/completions -- maps to agents, pipelines, workflows, or embeddings search
POST /v1/embeddings -- generates embeddings vectors
POST /v1/audio/speech -- text-to-speech synthesis
POST /v1/audio/transcriptions -- speech-to-text transcription
POST /v1/audio/translations -- audio translation to English

Model Parameter as Router

The OpenAI-compatible /v1/chat/completions endpoint uses the model parameter to determine which txtai component handles the request:

model Value	Routes To	Description
Agent name	`app.agent(model, ...)`	Executes an LLM-driven agent
`"embeddings"`	`app.search(...)`	Runs an embeddings search, returns top result text
Pipeline name	`app.pipeline(model, ...)`	Executes a named pipeline
Workflow name	`app.workflow(model, ...)`	Executes a named workflow
anything else	`app.pipeline("llm", ...)`	Falls back to the default LLM pipeline

This design allows a single endpoint to expose the full range of txtai's capabilities through a familiar interface.

Streaming Responses

When stream: true is set in a chat completion request, txtai returns a Server-Sent Events (SSE) stream. Each chunk follows the OpenAI streaming format:

data: {"id": "uuid", "object": "chat.completion.chunk", "model": "agent-name", "choices": [{"delta": {"content": "chunk text"}}]}

The stream terminates with:

data: [DONE]

This enables real-time token-by-token output for LLM-based responses.

Error Handling Patterns

The API uses standard HTTP status codes with descriptive error messages:

Status	Meaning	Example Trigger
200	Success	Search returns results
401	Unauthorized	Missing or invalid authorization token
403	Forbidden	Write operation on read-only index (`writable != True`)
422	Validation Error	Mismatched array lengths in `/addobject`

Write operations (add, index, delete, reindex) catch ReadOnlyError and translate it to HTTP 403:

try:
    application.get().add(documents)
except ReadOnlyError as e:
    raise HTTPException(status_code=403, detail=e.args[0]) from e

Design Rationale

GET vs POST Selection

GET is used for idempotent operations with simple parameters: /search, /count, /index, /transform
POST is used for operations with complex request bodies or side effects: /add, /batchsearch, /delete

The choice of GET /index (rather than POST) for the index-building operation is notable -- it triggers index construction from previously batched documents. While this has side effects, it is designed as a command endpoint rather than a resource creation endpoint.

Why OpenAI Compatibility

Providing an OpenAI-compatible interface serves several purposes:

Ecosystem integration: tools built for OpenAI (LangChain, LlamaIndex, etc.) can work with txtai
Migration path: teams can switch from OpenAI to local models without changing client code
Standardization: the OpenAI API has become a de facto standard for LLM interaction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment