Principle:Marker Inc Korea AutoRAG Serve And Monitor
| Knowledge Sources | |
|---|---|
| Domains | Deployment, API_Design |
| Last Updated | 2026-02-08 06:00 GMT |
Overview
A serving pattern that exposes an optimized RAG pipeline as a production REST API or interactive web interface with streaming support.
Description
Serve and Monitor covers the HTTP-based serving interfaces for deployed pipelines. The ApiRunner provides a Quart-based REST API with four endpoints: /v1/run (full pipeline execution returning answer and retrieved passages), /v1/retrieve (retrieval-only), /v1/stream (SSE streaming for progressive answer generation), and /version. The API supports optional ngrok tunneling for remote access. The GradioRunner provides a chat-style web interface using Gradio, and a Streamlit interface is also available. All servers use Pydantic models for request/response validation.
Usage
Use ApiRunner for production REST API serving, GradioRunner for quick demos, or Streamlit for internal dashboards. The API mode supports streaming for real-time UX.
Theoretical Basis
The serving architecture follows a single-process pipeline-as-a-service pattern:
- Initialize the runner (loads all module instances into memory)
- Start an HTTP server (Quart for API, Gradio/Streamlit for web)
- For each request: create pseudo QA DataFrame → run module chain → format response
- Streaming mode: yield retrieved passages first, then progressively yield generated text tokens