Workflow:Marker Inc Korea AutoRAG Pipeline Deployment

Knowledge Sources	AutoRAG AutoRAG Docs
Domains	RAG, Deployment, API
Last Updated	2026-02-12 12:00 GMT

Overview

End-to-end process for extracting, deploying, and serving an optimized RAG pipeline found by AutoRAG as executable code, a REST API server, or a web chat interface.

Description

This workflow covers everything after RAG pipeline optimization: extracting the best pipeline configuration from a completed trial, deploying it for production use, and serving it through various interfaces. The extraction process reads the trial summary to identify the winning module per node and produces a standalone YAML config. The deployment uses a Runner that instantiates each module in sequence, creating an executable pipeline. Three deployment modes are supported: direct Python code execution via Runner, an async REST API server via ApiRunner (Quart-based with OpenAPI documentation), and an interactive web chat interface via GradioRunner or Streamlit.

Usage

Execute this workflow after completing the RAG Pipeline Optimization workflow and selecting a satisfactory trial. Use the code runner for programmatic integration, the API server for microservice deployment, or the web interface for interactive demos and testing. This workflow can also re-evaluate the extracted pipeline against a held-out test dataset to verify generalization.

Execution Steps

Step 1: Extract Best Pipeline Configuration

Extract the optimal pipeline from a completed trial folder. The extract_best_config function reads the trial's summary.csv (which records the best module and parameters per node) and the original config.yaml (which provides node strategies and ordering). It reconstructs a minimal YAML configuration with exactly one module per node, representing the best end-to-end pipeline. Vector database configuration is also extracted from the project resources.

Key considerations:

The trial must be fully evaluated before extraction
The output YAML has the same structure as the input config but with only the winning modules
Vector database paths and embedding model references are preserved
The extracted config can be saved to a file or used directly as a dictionary

Step 2: Initialize Pipeline Runner

Instantiate a Runner (for code use), ApiRunner (for REST API), or GradioRunner (for web UI) from either the extracted YAML file or directly from the trial folder. The BaseRunner parses the config, instantiates each module class using the central support registry, and initializes module-specific resources (model weights, vector database connections, etc.). Each module is created with its winning parameters.

What happens:

Module classes are resolved via the support.py registry mapping string names to Python classes
Each module is instantiated with project_dir context for accessing vector databases and BM25 indices
The modules are stored in execution order matching the node line sequence
from_trial_folder combines extraction and initialization in one step

Step 3: Run Pipeline Queries

Execute the deployed pipeline against user queries. The Runner.run() method creates a pseudo QA DataFrame from the query, then passes it sequentially through each module instance. Each module receives the previous module's output DataFrame and adds its result columns. The final output is typically the generated_texts column from the generator module.

Key considerations:

The pipeline starts with query expansion or retrieval (whichever comes first)
Each module's pure() method is called for side-effect-free execution
Duplicate columns from previous stages are dropped to avoid conflicts
The result_column parameter controls which output column is returned

Step 4: Deploy as REST API Server

Launch the pipeline as an async REST API using the ApiRunner. The Quart-based server exposes three endpoints: /v1/run for full pipeline execution, /v1/retrieve for retrieval-only queries, and /v1/stream for streaming generation responses. The API follows OpenAPI 3.0 specifications with Swagger documentation.

Key considerations:

The server supports configurable host and port
nest_asyncio is applied for compatibility with existing event loops
A remote mode is available for hosted deployments
CORS and standard HTTP error handling are included

Step 5: Deploy as Web Chat Interface

Launch an interactive chat interface using either GradioRunner (for Gradio-based UI) or the Streamlit web module. The interface provides a conversational chat experience where users can type questions and see RAG-generated answers.

Key considerations:

GradioRunner is the recommended code-based approach
The Streamlit interface is available via CLI (autorag run_web)
Both interfaces support loading from YAML config or trial folder
The web interface is suitable for demos and interactive testing

Step 6: Evaluate on Test Dataset

Optionally re-run the extracted pipeline against a held-out test QA dataset to verify that the optimization results generalize. This uses the same Evaluator but with the extracted single-module config and the test dataset, producing a new trial folder with test performance metrics.

Key considerations:

Use a separate project directory for test evaluation to avoid data conflicts
Compare test metrics against training metrics to check for overfitting
The test dataset should follow the same parquet format as the training data

Execution Diagram

GitHub URL

Workflow Repository