Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Marker Inc Korea AutoRAG Serve And Monitor

From Leeroopedia


Knowledge Sources
Domains Deployment, API_Design
Last Updated 2026-02-08 06:00 GMT

Overview

A serving pattern that exposes an optimized RAG pipeline as a production REST API or interactive web interface with streaming support.

Description

Serve and Monitor covers the HTTP-based serving interfaces for deployed pipelines. The ApiRunner provides a Quart-based REST API with four endpoints: /v1/run (full pipeline execution returning answer and retrieved passages), /v1/retrieve (retrieval-only), /v1/stream (SSE streaming for progressive answer generation), and /version. The API supports optional ngrok tunneling for remote access. The GradioRunner provides a chat-style web interface using Gradio, and a Streamlit interface is also available. All servers use Pydantic models for request/response validation.

Usage

Use ApiRunner for production REST API serving, GradioRunner for quick demos, or Streamlit for internal dashboards. The API mode supports streaming for real-time UX.

Theoretical Basis

The serving architecture follows a single-process pipeline-as-a-service pattern:

  1. Initialize the runner (loads all module instances into memory)
  2. Start an HTTP server (Quart for API, Gradio/Streamlit for web)
  3. For each request: create pseudo QA DataFrame → run module chain → format response
  4. Streaming mode: yield retrieved passages first, then progressively yield generated text tokens

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment