Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Marker Inc Korea AutoRAG Evaluator Start Trial

From Leeroopedia
Knowledge Sources
Domains RAG Pipeline Evaluation, Model Validation
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for evaluating an optimized RAG pipeline against a test dataset by reusing the core Evaluator trial infrastructure, provided by the AutoRAG framework.

Description

The Evaluator.start_trial method orchestrates a complete pipeline evaluation. When used for test dataset evaluation, it is called with the best.yaml configuration (the output of extract_best_config) rather than a multi-candidate search config. Because the best config has exactly one module per node, the Evaluator performs a single forward pass rather than a combinatorial search.

The method proceeds through several phases. First, it creates a resources directory and optionally validates the config YAML via the Validator class. Then it generates a new trial name (an incrementing integer), creates the trial directory, and copies the config YAML into it. Next, it handles corpus ingestion: BM25 indices are built if any node uses BM25 retrieval, and vector database embeddings are ingested if any node uses vectordb retrieval. The ingestion respects the full_ingest parameter -- when True, the entire corpus is checked against the vector store; when False, only documents referenced in retrieval ground truth are considered.

The core execution iterates over each node line in the config, calling run_node_line to process all nodes within that line. The first node line receives the QA data as its initial input. Results flow forward from each node line to the next. After all node lines complete, the per-node-line summaries are aggregated into a trial-level summary.csv containing columns for node line name, node type, best module filename, best module name, best module params, and best execution time.

Usage

Import Evaluator and call start_trial with the extracted best config YAML to evaluate the optimized pipeline on held-out test data. Create the Evaluator with test QA and corpus data paths (not the training data). This is the standard validation step before production deployment.

Code Reference

Source Location

  • Repository: AutoRAG
  • File: autorag/evaluator.py (lines 106-219)

Signature

class Evaluator:
    def __init__(self, qa_data_path: str, corpus_data_path: str,
                 project_dir: Optional[str] = None):
        ...

    def start_trial(self, yaml_path: str, skip_validation: bool = False,
                    full_ingest: bool = True):
        ...

Import

from autorag.evaluator import Evaluator

I/O Contract

Inputs

Name Type Required Description
qa_data_path str yes Path to the test QA dataset in parquet format (for Evaluator constructor)
corpus_data_path str yes Path to the corpus dataset in parquet format (for Evaluator constructor)
project_dir Optional[str] no Path to the project directory for storing trial results. Defaults to current working directory.
yaml_path str yes Path to the best.yaml config file (output of extract_best_config) for start_trial
skip_validation bool no If True, skips config YAML validation. Default is False.
full_ingest bool no If True, checks the entire corpus against the vector DB for ingestion. If False, only checks documents in retrieval ground truth. Default is True.

Outputs

Name Type Description
trial directory directory on disk A new numbered trial directory (e.g., project_dir/0/) containing config.yaml and per-node-line subdirectories with evaluation results
summary.csv CSV file Trial-level summary with columns: node_line_name, node_type, best_module_filename, best_module_name, best_module_params, best_execution_time

External Reference

This is a wrapper doc. The Evaluator.start_trial method is the same API used for the optimization phase. In the test evaluation context, it is reused with a fundamentally different config:

Aspect Optimization Trial Test Evaluation
Config source Multi-candidate YAML with many modules per node best.yaml with exactly one module per node
Search behavior Combinatorial: evaluates all module combinations Single-path: runs the one specified module per node
QA dataset Training QA pairs Held-out test QA pairs
Purpose Find the best module at each node Measure generalization performance of the selected modules
Output interpretation Best modules are selected from candidates Metrics indicate real-world expected performance

The Evaluator class itself does not distinguish between these use cases. The difference is entirely determined by the config file passed to start_trial and the QA data provided to the constructor. This reuse pattern avoids code duplication and ensures that test evaluation uses exactly the same execution infrastructure as training evaluation.

Usage Examples

Basic Usage

from autorag.evaluator import Evaluator
from autorag.deploy.base import extract_best_config

# Step 1: Extract the best config from the training trial
extract_best_config(
    trial_path="./my_project/0",
    output_path="./my_project/best.yaml"
)

# Step 2: Create a new Evaluator with TEST data
test_evaluator = Evaluator(
    qa_data_path="./data/qa_test.parquet",
    corpus_data_path="./data/corpus.parquet",
    project_dir="./my_test_project"
)

# Step 3: Run the best pipeline on the test dataset
test_evaluator.start_trial(yaml_path="./my_project/best.yaml")

# Step 4: Inspect test metrics
import pandas as pd
test_summary = pd.read_csv("./my_test_project/0/summary.csv")
print(test_summary[["node_type", "best_module_name", "best_execution_time"]])

Skip Validation for Speed

from autorag.evaluator import Evaluator

test_evaluator = Evaluator(
    qa_data_path="./data/qa_test.parquet",
    corpus_data_path="./data/corpus.parquet",
    project_dir="./my_test_project"
)

# Skip validation when you know the config is well-formed
test_evaluator.start_trial(
    yaml_path="./my_project/best.yaml",
    skip_validation=True,
    full_ingest=False  # Faster: only ingest documents referenced in retrieval GT
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment