Implementation:Marker Inc Korea AutoRAG ApiRunner Run Api Server

Knowledge Sources	AutoRAG
Domains	RAG Pipeline Deployment, REST API Design
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for serving an optimized RAG pipeline as an asynchronous HTTP API with structured endpoints for queries, retrieval, streaming, and version info, provided by the AutoRAG framework.

Description

The ApiRunner class extends BaseRunner to add HTTP serving capabilities. During initialization, it creates a Quart application, loads the corpus.parquet file for passage content retrieval, and registers four API routes.

The /v1/run endpoint validates the request body via QueryRequest, runs the full module chain (same logic as Runner.run), then calls extract_retrieve_passage to build RetrievedPassage objects with content, document IDs, scores, file paths, page numbers, and start/end indices. It returns a RunResponse containing both the generated text and the retrieved passages.

The /v1/retrieve endpoint runs only retrieval and reranking modules (skipping BasePromptMaker and BaseGenerator instances) and returns a RetrievalResponse with just the passages.

The /v1/stream endpoint uses Quart's stream_with_context to yield StreamResponse objects incrementally. Retrieved passages are yielded first, then the generator module's astream method produces text tokens one at a time.

The /version endpoint reads the AutoRAG VERSION file and returns a VersionResponse.

The run_api_server method starts the Quart server, optionally creating an ngrok HTTP tunnel for public access when remote=True.

Usage

Import ApiRunner when you need to serve the pipeline over HTTP for integration with other services, web frontends, or external clients. Use remote=True for quick public access during development, and remote=False behind a reverse proxy in production.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/deploy/api.py (lines 65-248)

Signature

class ApiRunner(BaseRunner):
    def __init__(self, config: Dict, project_dir: Optional[str] = None):
        ...

    def run_api_server(self, host: str = "0.0.0.0", port: int = 8000,
                       remote: bool = True, **kwargs):
        ...

Request/Response Models:

class QueryRequest(BaseModel):
    query: str
    result_column: Optional[str] = "generated_texts"

class RetrievedPassage(BaseModel):
    content: str
    doc_id: str
    score: float
    filepath: Optional[str] = None
    file_page: Optional[int] = None
    start_idx: Optional[int] = None
    end_idx: Optional[int] = None

class RunResponse(BaseModel):
    result: Union[str, List[str]]
    retrieved_passage: List[RetrievedPassage]

class RetrievalResponse(BaseModel):
    passages: List[RetrievedPassage]

class StreamResponse(BaseModel):
    type: Literal["generated_text", "retrieved_passage"]
    generated_text: Optional[str]
    retrieved_passage: Optional[RetrievedPassage]
    passage_index: Optional[int]

class VersionResponse(BaseModel):
    version: str

Import

from autorag.deploy.api import ApiRunner

I/O Contract

Inputs

Name	Type	Required	Description
config	Dict	yes	Pipeline config dictionary with one module per node
project_dir	Optional[str]	no	Path to the project directory containing data/corpus.parquet. Defaults to current working directory.
host	str	no	Hostname to bind the server to. Default is "0.0.0.0".
port	int	no	Port number for the server. Default is 8000.
remote	bool	no	If True, creates an ngrok tunnel for public access. Default is True.

Outputs

Name	Type	Description
(server)	Running Quart HTTP server	A blocking call that serves the API until interrupted

API Endpoint Outputs:

Endpoint	Method	Response Model	Description
/v1/run	POST	RunResponse	Generated answer text plus retrieved passages with metadata
/v1/retrieve	POST	RetrievalResponse	Retrieved passages only, no generation
/v1/stream	POST	StreamResponse (streamed)	Incremental retrieved passages followed by streaming generated text tokens
/version	GET	VersionResponse	AutoRAG library version string

Usage Examples

Basic Usage

from autorag.deploy.api import ApiRunner

# Initialize from a trial folder
runner = ApiRunner.from_trial_folder(trial_path="./my_project/0")

# Start the API server on port 8000 with ngrok tunnel
runner.run_api_server(host="0.0.0.0", port=8000, remote=True)

From YAML Without Remote Tunnel

from autorag.deploy.api import ApiRunner

runner = ApiRunner.from_yaml(
    yaml_path="./my_project/best.yaml",
    project_dir="./my_project"
)
runner.run_api_server(host="127.0.0.1", port=5000, remote=False)

Client Example (curl)

# Full pipeline query
curl -X POST http://localhost:8000/v1/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is AutoRAG?"}'

# Retrieval only
curl -X POST http://localhost:8000/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "What is AutoRAG?"}'

# Version check
curl http://localhost:8000/version

Related Pages

Implements Principle

Principle:Marker_Inc_Korea_AutoRAG_REST_API_Deployment

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment