Implementation:Marker Inc Korea AutoRAG Extract Best Config
| Knowledge Sources | |
|---|---|
| Domains | RAG Pipeline Optimization, Configuration Management |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for extracting the optimal single-module-per-node pipeline configuration from a completed AutoRAG evaluation trial, provided by the AutoRAG framework.
Description
The extract_best_config function reads the summary.csv and config.yaml from a completed trial directory. It uses load_summary_file to parse the summary (deserializing the best_module_params column from string to dict), then calls summary_df_to_yaml to build a new YAML dictionary where each node retains only its winning module. The function also calls extract_vectordb_config to attach the vector database settings from the project's resources/vectordb.yaml. If an output_path is provided, the result is written as a YAML file; otherwise, it is returned as a Python dictionary.
The helper summary_df_to_yaml preserves node line ordering by converting the node line names to a pandas Categorical based on the original config order, then groups by node line and constructs the nested node_lines/nodes/modules structure expected by the Runner classes.
Usage
Import this function when you need to programmatically convert evaluation results into a deployable config. This is typically done once after an optimization trial completes, before passing the config to a Runner. It is also invoked internally by BaseRunner.from_trial_folder.
Code Reference
Source Location
- Repository: AutoRAG
- File: autorag/deploy/base.py (lines 95-121)
Signature
def extract_best_config(trial_path: str, output_path: Optional[str] = None) -> Dict:
Import
from autorag.deploy.base import extract_best_config
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| trial_path | str | yes | Path to the evaluated trial directory containing summary.csv and config.yaml |
| output_path | Optional[str] | no | File path where the extracted YAML config will be saved. Must end with .yaml or .yml. If None, no file is written. |
Outputs
| Name | Type | Description |
|---|---|---|
| yaml_dict | Dict | Dictionary representing the optimal pipeline configuration with one module per node, including vectordb settings |
Usage Examples
Basic Usage
from autorag.deploy.base import extract_best_config
# Extract and save the best config from a completed trial
best_config = extract_best_config(
trial_path="./my_project/0",
output_path="./my_project/best.yaml"
)
# The returned dict has the structure:
# {
# "node_lines": [
# {
# "node_line_name": "retrieve_node_line",
# "nodes": [
# {
# "node_type": "retrieval",
# "strategy": {...},
# "modules": [{"module_type": "bm25", ...}]
# }
# ]
# }
# ],
# "vectordb": [{"name": "default", "db_type": "chroma", ...}]
# }
In-Memory Usage Without Saving
from autorag.deploy.base import extract_best_config
# Extract config without saving to disk
config = extract_best_config(trial_path="./my_project/0")
# Pass directly to a Runner
from autorag.deploy.base import Runner
runner = Runner(config, project_dir="./my_project")