Principle:Marker Inc Korea AutoRAG Serve And Monitor

Knowledge Sources	AutoRAG Docs
Domains	Deployment, API_Design
Last Updated	2026-02-08 06:00 GMT

Overview

A serving pattern that exposes an optimized RAG pipeline as a production REST API or interactive web interface with streaming support.

Description

Serve and Monitor covers the HTTP-based serving interfaces for deployed pipelines. The ApiRunner provides a Quart-based REST API with four endpoints: /v1/run (full pipeline execution returning answer and retrieved passages), /v1/retrieve (retrieval-only), /v1/stream (SSE streaming for progressive answer generation), and /version. The API supports optional ngrok tunneling for remote access. The GradioRunner provides a chat-style web interface using Gradio, and a Streamlit interface is also available. All servers use Pydantic models for request/response validation.

Usage

Use ApiRunner for production REST API serving, GradioRunner for quick demos, or Streamlit for internal dashboards. The API mode supports streaming for real-time UX.

Theoretical Basis

The serving architecture follows a single-process pipeline-as-a-service pattern:

Initialize the runner (loads all module instances into memory)
Start an HTTP server (Quart for API, Gradio/Streamlit for web)
For each request: create pseudo QA DataFrame → run module chain → format response
Streaming mode: yield retrieved passages first, then progressively yield generated text tokens

Related Pages

Implemented By

Implementation:Marker_Inc_Korea_AutoRAG_Api_Runner_Run_Api_Server

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment