Principle:Marker Inc Korea AutoRAG Deployment Mode Selection
| Knowledge Sources | |
|---|---|
| Domains | Deployment, API_Design |
| Last Updated | 2026-02-08 06:00 GMT |
Overview
A deployment pattern that provides multiple execution modes for running an optimized RAG pipeline: code, API server, or web interface.
Description
After initialization, AutoRAG pipelines can be deployed in three modes: Code Runner (programmatic access via Runner.run), API Server (REST endpoints via ApiRunner), and Web Interface (interactive chat via GradioRunner or Streamlit). Code mode is for integration into Python applications. API mode exposes /v1/run, /v1/retrieve, and /v1/stream endpoints. Web mode provides a chat-like interface for end users. Each mode uses the same underlying module chain but with different input/output interfaces.
Usage
Choose the deployment mode based on the use case: code mode for batch processing or embedding in applications, API mode for microservice architectures, or web mode for demos and user-facing applications.
Theoretical Basis
All deployment modes share the same execution pattern:
- Create a pseudo QA DataFrame from the user query
- Sequentially run each module instance on the previous result
- Merge module outputs into the growing result DataFrame
- Extract the final output from the specified result column
The difference lies only in the input/output interface, not the pipeline execution.