Workflow:Marker Inc Korea AutoRAG Pipeline Deployment
| Knowledge Sources | |
|---|---|
| Domains | RAG, Deployment, API |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
End-to-end process for extracting, deploying, and serving an optimized RAG pipeline found by AutoRAG as executable code, a REST API server, or a web chat interface.
Description
This workflow covers everything after RAG pipeline optimization: extracting the best pipeline configuration from a completed trial, deploying it for production use, and serving it through various interfaces. The extraction process reads the trial summary to identify the winning module per node and produces a standalone YAML config. The deployment uses a Runner that instantiates each module in sequence, creating an executable pipeline. Three deployment modes are supported: direct Python code execution via Runner, an async REST API server via ApiRunner (Quart-based with OpenAPI documentation), and an interactive web chat interface via GradioRunner or Streamlit.
Usage
Execute this workflow after completing the RAG Pipeline Optimization workflow and selecting a satisfactory trial. Use the code runner for programmatic integration, the API server for microservice deployment, or the web interface for interactive demos and testing. This workflow can also re-evaluate the extracted pipeline against a held-out test dataset to verify generalization.
Execution Steps
Step 1: Extract Best Pipeline Configuration
Extract the optimal pipeline from a completed trial folder. The extract_best_config function reads the trial's summary.csv (which records the best module and parameters per node) and the original config.yaml (which provides node strategies and ordering). It reconstructs a minimal YAML configuration with exactly one module per node, representing the best end-to-end pipeline. Vector database configuration is also extracted from the project resources.
Key considerations:
- The trial must be fully evaluated before extraction
- The output YAML has the same structure as the input config but with only the winning modules
- Vector database paths and embedding model references are preserved
- The extracted config can be saved to a file or used directly as a dictionary
Step 2: Initialize Pipeline Runner
Instantiate a Runner (for code use), ApiRunner (for REST API), or GradioRunner (for web UI) from either the extracted YAML file or directly from the trial folder. The BaseRunner parses the config, instantiates each module class using the central support registry, and initializes module-specific resources (model weights, vector database connections, etc.). Each module is created with its winning parameters.
What happens:
- Module classes are resolved via the support.py registry mapping string names to Python classes
- Each module is instantiated with project_dir context for accessing vector databases and BM25 indices
- The modules are stored in execution order matching the node line sequence
- from_trial_folder combines extraction and initialization in one step
Step 3: Run Pipeline Queries
Execute the deployed pipeline against user queries. The Runner.run() method creates a pseudo QA DataFrame from the query, then passes it sequentially through each module instance. Each module receives the previous module's output DataFrame and adds its result columns. The final output is typically the generated_texts column from the generator module.
Key considerations:
- The pipeline starts with query expansion or retrieval (whichever comes first)
- Each module's pure() method is called for side-effect-free execution
- Duplicate columns from previous stages are dropped to avoid conflicts
- The result_column parameter controls which output column is returned
Step 4: Deploy as REST API Server
Launch the pipeline as an async REST API using the ApiRunner. The Quart-based server exposes three endpoints: /v1/run for full pipeline execution, /v1/retrieve for retrieval-only queries, and /v1/stream for streaming generation responses. The API follows OpenAPI 3.0 specifications with Swagger documentation.
Key considerations:
- The server supports configurable host and port
- nest_asyncio is applied for compatibility with existing event loops
- A remote mode is available for hosted deployments
- CORS and standard HTTP error handling are included
Step 5: Deploy as Web Chat Interface
Launch an interactive chat interface using either GradioRunner (for Gradio-based UI) or the Streamlit web module. The interface provides a conversational chat experience where users can type questions and see RAG-generated answers.
Key considerations:
- GradioRunner is the recommended code-based approach
- The Streamlit interface is available via CLI (autorag run_web)
- Both interfaces support loading from YAML config or trial folder
- The web interface is suitable for demos and interactive testing
Step 6: Evaluate on Test Dataset
Optionally re-run the extracted pipeline against a held-out test QA dataset to verify that the optimization results generalize. This uses the same Evaluator but with the extracted single-module config and the test dataset, producing a new trial folder with test performance metrics.
Key considerations:
- Use a separate project directory for test evaluation to avoid data conflicts
- Compare test metrics against training metrics to check for overfitting
- The test dataset should follow the same parquet format as the training data