Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Marker Inc Korea AutoRAG ApiRunner Run Api Server

From Leeroopedia
Knowledge Sources
Domains RAG Pipeline Deployment, REST API Design
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for serving an optimized RAG pipeline as an asynchronous HTTP API with structured endpoints for queries, retrieval, streaming, and version info, provided by the AutoRAG framework.

Description

The ApiRunner class extends BaseRunner to add HTTP serving capabilities. During initialization, it creates a Quart application, loads the corpus.parquet file for passage content retrieval, and registers four API routes.

The /v1/run endpoint validates the request body via QueryRequest, runs the full module chain (same logic as Runner.run), then calls extract_retrieve_passage to build RetrievedPassage objects with content, document IDs, scores, file paths, page numbers, and start/end indices. It returns a RunResponse containing both the generated text and the retrieved passages.

The /v1/retrieve endpoint runs only retrieval and reranking modules (skipping BasePromptMaker and BaseGenerator instances) and returns a RetrievalResponse with just the passages.

The /v1/stream endpoint uses Quart's stream_with_context to yield StreamResponse objects incrementally. Retrieved passages are yielded first, then the generator module's astream method produces text tokens one at a time.

The /version endpoint reads the AutoRAG VERSION file and returns a VersionResponse.

The run_api_server method starts the Quart server, optionally creating an ngrok HTTP tunnel for public access when remote=True.

Usage

Import ApiRunner when you need to serve the pipeline over HTTP for integration with other services, web frontends, or external clients. Use remote=True for quick public access during development, and remote=False behind a reverse proxy in production.

Code Reference

Source Location

  • Repository: AutoRAG
  • File: autorag/deploy/api.py (lines 65-248)

Signature

class ApiRunner(BaseRunner):
    def __init__(self, config: Dict, project_dir: Optional[str] = None):
        ...

    def run_api_server(self, host: str = "0.0.0.0", port: int = 8000,
                       remote: bool = True, **kwargs):
        ...

Request/Response Models:

class QueryRequest(BaseModel):
    query: str
    result_column: Optional[str] = "generated_texts"

class RetrievedPassage(BaseModel):
    content: str
    doc_id: str
    score: float
    filepath: Optional[str] = None
    file_page: Optional[int] = None
    start_idx: Optional[int] = None
    end_idx: Optional[int] = None

class RunResponse(BaseModel):
    result: Union[str, List[str]]
    retrieved_passage: List[RetrievedPassage]

class RetrievalResponse(BaseModel):
    passages: List[RetrievedPassage]

class StreamResponse(BaseModel):
    type: Literal["generated_text", "retrieved_passage"]
    generated_text: Optional[str]
    retrieved_passage: Optional[RetrievedPassage]
    passage_index: Optional[int]

class VersionResponse(BaseModel):
    version: str

Import

from autorag.deploy.api import ApiRunner

I/O Contract

Inputs

Name Type Required Description
config Dict yes Pipeline config dictionary with one module per node
project_dir Optional[str] no Path to the project directory containing data/corpus.parquet. Defaults to current working directory.
host str no Hostname to bind the server to. Default is "0.0.0.0".
port int no Port number for the server. Default is 8000.
remote bool no If True, creates an ngrok tunnel for public access. Default is True.

Outputs

Name Type Description
(server) Running Quart HTTP server A blocking call that serves the API until interrupted

API Endpoint Outputs:

Endpoint Method Response Model Description
/v1/run POST RunResponse Generated answer text plus retrieved passages with metadata
/v1/retrieve POST RetrievalResponse Retrieved passages only, no generation
/v1/stream POST StreamResponse (streamed) Incremental retrieved passages followed by streaming generated text tokens
/version GET VersionResponse AutoRAG library version string

Usage Examples

Basic Usage

from autorag.deploy.api import ApiRunner

# Initialize from a trial folder
runner = ApiRunner.from_trial_folder(trial_path="./my_project/0")

# Start the API server on port 8000 with ngrok tunnel
runner.run_api_server(host="0.0.0.0", port=8000, remote=True)

From YAML Without Remote Tunnel

from autorag.deploy.api import ApiRunner

runner = ApiRunner.from_yaml(
    yaml_path="./my_project/best.yaml",
    project_dir="./my_project"
)
runner.run_api_server(host="127.0.0.1", port=5000, remote=False)

Client Example (curl)

# Full pipeline query
curl -X POST http://localhost:8000/v1/run \
  -H "Content-Type: application/json" \
  -d '{"query": "What is AutoRAG?"}'

# Retrieval only
curl -X POST http://localhost:8000/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "What is AutoRAG?"}'

# Version check
curl http://localhost:8000/version

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment