Implementation:Marker Inc Korea AutoRAG Extract Best Config

Knowledge Sources	AutoRAG
Domains	RAG Pipeline Optimization, Configuration Management
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for extracting the optimal single-module-per-node pipeline configuration from a completed AutoRAG evaluation trial, provided by the AutoRAG framework.

Description

The extract_best_config function reads the summary.csv and config.yaml from a completed trial directory. It uses load_summary_file to parse the summary (deserializing the best_module_params column from string to dict), then calls summary_df_to_yaml to build a new YAML dictionary where each node retains only its winning module. The function also calls extract_vectordb_config to attach the vector database settings from the project's resources/vectordb.yaml. If an output_path is provided, the result is written as a YAML file; otherwise, it is returned as a Python dictionary.

The helper summary_df_to_yaml preserves node line ordering by converting the node line names to a pandas Categorical based on the original config order, then groups by node line and constructs the nested node_lines/nodes/modules structure expected by the Runner classes.

Usage

Import this function when you need to programmatically convert evaluation results into a deployable config. This is typically done once after an optimization trial completes, before passing the config to a Runner. It is also invoked internally by BaseRunner.from_trial_folder.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/deploy/base.py (lines 95-121)

Signature

def extract_best_config(trial_path: str, output_path: Optional[str] = None) -> Dict:

Import

from autorag.deploy.base import extract_best_config

I/O Contract

Inputs

Name	Type	Required	Description
trial_path	str	yes	Path to the evaluated trial directory containing summary.csv and config.yaml
output_path	Optional[str]	no	File path where the extracted YAML config will be saved. Must end with .yaml or .yml. If None, no file is written.

Outputs

Name	Type	Description
yaml_dict	Dict	Dictionary representing the optimal pipeline configuration with one module per node, including vectordb settings

Usage Examples

Basic Usage

from autorag.deploy.base import extract_best_config

# Extract and save the best config from a completed trial
best_config = extract_best_config(
    trial_path="./my_project/0",
    output_path="./my_project/best.yaml"
)

# The returned dict has the structure:
# {
#     "node_lines": [
#         {
#             "node_line_name": "retrieve_node_line",
#             "nodes": [
#                 {
#                     "node_type": "retrieval",
#                     "strategy": {...},
#                     "modules": [{"module_type": "bm25", ...}]
#                 }
#             ]
#         }
#     ],
#     "vectordb": [{"name": "default", "db_type": "chroma", ...}]
# }

In-Memory Usage Without Saving

from autorag.deploy.base import extract_best_config

# Extract config without saving to disk
config = extract_best_config(trial_path="./my_project/0")

# Pass directly to a Runner
from autorag.deploy.base import Runner
runner = Runner(config, project_dir="./my_project")

Related Pages

Implements Principle

Principle:Marker_Inc_Korea_AutoRAG_Best_Config_Extraction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment