Implementation:Marker Inc Korea AutoRAG ApiRunner Run Api Server
| Knowledge Sources | |
|---|---|
| Domains | RAG Pipeline Deployment, REST API Design |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for serving an optimized RAG pipeline as an asynchronous HTTP API with structured endpoints for queries, retrieval, streaming, and version info, provided by the AutoRAG framework.
Description
The ApiRunner class extends BaseRunner to add HTTP serving capabilities. During initialization, it creates a Quart application, loads the corpus.parquet file for passage content retrieval, and registers four API routes.
The /v1/run endpoint validates the request body via QueryRequest, runs the full module chain (same logic as Runner.run), then calls extract_retrieve_passage to build RetrievedPassage objects with content, document IDs, scores, file paths, page numbers, and start/end indices. It returns a RunResponse containing both the generated text and the retrieved passages.
The /v1/retrieve endpoint runs only retrieval and reranking modules (skipping BasePromptMaker and BaseGenerator instances) and returns a RetrievalResponse with just the passages.
The /v1/stream endpoint uses Quart's stream_with_context to yield StreamResponse objects incrementally. Retrieved passages are yielded first, then the generator module's astream method produces text tokens one at a time.
The /version endpoint reads the AutoRAG VERSION file and returns a VersionResponse.
The run_api_server method starts the Quart server, optionally creating an ngrok HTTP tunnel for public access when remote=True.
Usage
Import ApiRunner when you need to serve the pipeline over HTTP for integration with other services, web frontends, or external clients. Use remote=True for quick public access during development, and remote=False behind a reverse proxy in production.
Code Reference
Source Location
- Repository: AutoRAG
- File: autorag/deploy/api.py (lines 65-248)
Signature
class ApiRunner(BaseRunner):
def __init__(self, config: Dict, project_dir: Optional[str] = None):
...
def run_api_server(self, host: str = "0.0.0.0", port: int = 8000,
remote: bool = True, **kwargs):
...
Request/Response Models:
class QueryRequest(BaseModel):
query: str
result_column: Optional[str] = "generated_texts"
class RetrievedPassage(BaseModel):
content: str
doc_id: str
score: float
filepath: Optional[str] = None
file_page: Optional[int] = None
start_idx: Optional[int] = None
end_idx: Optional[int] = None
class RunResponse(BaseModel):
result: Union[str, List[str]]
retrieved_passage: List[RetrievedPassage]
class RetrievalResponse(BaseModel):
passages: List[RetrievedPassage]
class StreamResponse(BaseModel):
type: Literal["generated_text", "retrieved_passage"]
generated_text: Optional[str]
retrieved_passage: Optional[RetrievedPassage]
passage_index: Optional[int]
class VersionResponse(BaseModel):
version: str
Import
from autorag.deploy.api import ApiRunner
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Dict | yes | Pipeline config dictionary with one module per node |
| project_dir | Optional[str] | no | Path to the project directory containing data/corpus.parquet. Defaults to current working directory. |
| host | str | no | Hostname to bind the server to. Default is "0.0.0.0". |
| port | int | no | Port number for the server. Default is 8000. |
| remote | bool | no | If True, creates an ngrok tunnel for public access. Default is True. |
Outputs
| Name | Type | Description |
|---|---|---|
| (server) | Running Quart HTTP server | A blocking call that serves the API until interrupted |
API Endpoint Outputs:
| Endpoint | Method | Response Model | Description |
|---|---|---|---|
| /v1/run | POST | RunResponse | Generated answer text plus retrieved passages with metadata |
| /v1/retrieve | POST | RetrievalResponse | Retrieved passages only, no generation |
| /v1/stream | POST | StreamResponse (streamed) | Incremental retrieved passages followed by streaming generated text tokens |
| /version | GET | VersionResponse | AutoRAG library version string |
Usage Examples
Basic Usage
from autorag.deploy.api import ApiRunner
# Initialize from a trial folder
runner = ApiRunner.from_trial_folder(trial_path="./my_project/0")
# Start the API server on port 8000 with ngrok tunnel
runner.run_api_server(host="0.0.0.0", port=8000, remote=True)
From YAML Without Remote Tunnel
from autorag.deploy.api import ApiRunner
runner = ApiRunner.from_yaml(
yaml_path="./my_project/best.yaml",
project_dir="./my_project"
)
runner.run_api_server(host="127.0.0.1", port=5000, remote=False)
Client Example (curl)
# Full pipeline query
curl -X POST http://localhost:8000/v1/run \
-H "Content-Type: application/json" \
-d '{"query": "What is AutoRAG?"}'
# Retrieval only
curl -X POST http://localhost:8000/v1/retrieve \
-H "Content-Type: application/json" \
-d '{"query": "What is AutoRAG?"}'
# Version check
curl http://localhost:8000/version