Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Marker Inc Korea AutoRAG Best Config Extraction

From Leeroopedia
Knowledge Sources
Domains RAG Pipeline Optimization, Configuration Management
Last Updated 2026-02-12 00:00 GMT

Overview

Best config extraction distills multi-candidate trial results into a single deployable pipeline configuration by selecting the top-performing module at each node.

Description

During an AutoRAG optimization trial, the Evaluator tests multiple candidate modules at every node in the pipeline (e.g., several retrieval strategies, several rerankers). The results of this evaluation are stored in a summary.csv file within the trial directory, which records the best module name, its parameters, and performance metrics for each node.

Best config extraction reads this summary and the original trial config.yaml, then constructs a new YAML configuration dictionary where each node contains exactly one module -- the winner. This is the critical transition from an evaluation configuration (many candidates per node) to a deployment configuration (one module per node). Without this step, the Runner classes cannot instantiate the pipeline because they enforce a strict one-module-per-node constraint.

The extraction process also pulls in the vector database configuration from the project's resources/vectordb.yaml file, ensuring that the deployment config is self-contained and includes all storage backend details needed for retrieval modules. The resulting dictionary can be saved as a YAML file (commonly called best.yaml) or used directly in memory.

Usage

Use best config extraction immediately after a successful optimization trial completes. It is the bridge between the experimentation phase (Evaluator.start_trial) and the deployment phase (Runner, ApiRunner, or GradioRunner initialization). It is also called internally by the BaseRunner.from_trial_folder class method for convenience.

Theoretical Basis

The extraction follows a straightforward argmax selection pattern:

For each node N in the pipeline:
    best_module[N] = argmax(evaluation_metric, candidates[N])
    deploy_config[N] = {module_type: best_module[N].name, **best_module[N].params}

The algorithm preserves the original node line ordering from the trial config by converting node line names to a pandas Categorical with the original order, then sorting. This ensures the deployment pipeline executes modules in the same topological order as the evaluation pipeline.

Key invariants:

  • The output config has exactly one module per node.
  • Node line order matches the original config order.
  • The vectordb configuration is extracted from the project resources directory rather than from the trial summary, as it represents infrastructure state.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment