Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Marker Inc Korea AutoRAG Extract Best Config

From Leeroopedia
Knowledge Sources
Domains RAG Pipeline Optimization, Configuration Management
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for extracting the optimal single-module-per-node pipeline configuration from a completed AutoRAG evaluation trial, provided by the AutoRAG framework.

Description

The extract_best_config function reads the summary.csv and config.yaml from a completed trial directory. It uses load_summary_file to parse the summary (deserializing the best_module_params column from string to dict), then calls summary_df_to_yaml to build a new YAML dictionary where each node retains only its winning module. The function also calls extract_vectordb_config to attach the vector database settings from the project's resources/vectordb.yaml. If an output_path is provided, the result is written as a YAML file; otherwise, it is returned as a Python dictionary.

The helper summary_df_to_yaml preserves node line ordering by converting the node line names to a pandas Categorical based on the original config order, then groups by node line and constructs the nested node_lines/nodes/modules structure expected by the Runner classes.

Usage

Import this function when you need to programmatically convert evaluation results into a deployable config. This is typically done once after an optimization trial completes, before passing the config to a Runner. It is also invoked internally by BaseRunner.from_trial_folder.

Code Reference

Source Location

  • Repository: AutoRAG
  • File: autorag/deploy/base.py (lines 95-121)

Signature

def extract_best_config(trial_path: str, output_path: Optional[str] = None) -> Dict:

Import

from autorag.deploy.base import extract_best_config

I/O Contract

Inputs

Name Type Required Description
trial_path str yes Path to the evaluated trial directory containing summary.csv and config.yaml
output_path Optional[str] no File path where the extracted YAML config will be saved. Must end with .yaml or .yml. If None, no file is written.

Outputs

Name Type Description
yaml_dict Dict Dictionary representing the optimal pipeline configuration with one module per node, including vectordb settings

Usage Examples

Basic Usage

from autorag.deploy.base import extract_best_config

# Extract and save the best config from a completed trial
best_config = extract_best_config(
    trial_path="./my_project/0",
    output_path="./my_project/best.yaml"
)

# The returned dict has the structure:
# {
#     "node_lines": [
#         {
#             "node_line_name": "retrieve_node_line",
#             "nodes": [
#                 {
#                     "node_type": "retrieval",
#                     "strategy": {...},
#                     "modules": [{"module_type": "bm25", ...}]
#                 }
#             ]
#         }
#     ],
#     "vectordb": [{"name": "default", "db_type": "chroma", ...}]
# }

In-Memory Usage Without Saving

from autorag.deploy.base import extract_best_config

# Extract config without saving to disk
config = extract_best_config(trial_path="./my_project/0")

# Pass directly to a Runner
from autorag.deploy.base import Runner
runner = Runner(config, project_dir="./my_project")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment