Implementation:Marker Inc Korea AutoRAG Evaluator Init

Knowledge Sources	AutoRAG
Domains	Pipeline Orchestration, RAG Pipeline Optimization
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for initializing the optimization trial environment provided by the AutoRAG framework.

Description

The Evaluator.__init__ method bootstraps the entire optimization workspace. It validates that both the QA and corpus dataset paths exist and point to parquet files, loads them into pandas DataFrames using the PyArrow engine, and applies schema casting via cast_qa_dataset and cast_corpus_dataset to ensure column types are consistent. It then creates the project directory (defaulting to the current working directory if none is specified) and a data/ subdirectory within it. The QA and corpus datasets are copied into the project as data/qa.parquet and data/corpus.parquet respectively, using idempotent writes that skip copying if the files already exist. Finally, validate_qa_from_corpus_dataset is called to verify that every document ID in the QA ground truth exists in the corpus, catching data integrity issues before any trial begins.

Usage

Import and instantiate Evaluator whenever you need to run an optimization trial. The constructor must be called before start_trial or restart_trial. It is also used internally by the Validator class to create temporary evaluation environments for configuration validation.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/evaluator.py (lines 55-104)

Signature

class Evaluator:
    def __init__(
        self,
        qa_data_path: str,
        corpus_data_path: str,
        project_dir: Optional[str] = None,
    ):
        """
        Initialize an Evaluator object.

        :param qa_data_path: The path to the QA dataset. Must be parquet file.
        :param corpus_data_path: The path to the corpus dataset. Must be parquet file.
        :param project_dir: The path to the project directory. Default is the current directory.
        """

Import

from autorag.evaluator import Evaluator

I/O Contract

Inputs

Name	Type	Required	Description
qa_data_path	str	yes	Path to the QA dataset in parquet format. Must exist and have a .parquet extension. The dataset must contain columns including qid, query, retrieval_gt, and generation_gt.
corpus_data_path	str	yes	Path to the corpus dataset in parquet format. Must exist and have a .parquet extension. The dataset must contain columns including doc_id and contents.
project_dir	Optional[str]	no	Path to the project directory where trial data and results will be stored. Defaults to the current working directory. Created automatically if it does not exist.

Outputs

Name	Type	Description
evaluator	Evaluator	An initialized Evaluator instance with loaded datasets (self.qa_data, self.corpus_data), path references (self.qa_data_path, self.corpus_data_path, self.project_dir), and a prepared project directory structure containing data/qa.parquet and data/corpus.parquet.

Usage Examples

Basic Usage

from autorag.evaluator import Evaluator

# Initialize the evaluator with dataset paths and a project directory
evaluator = Evaluator(
    qa_data_path="data/qa.parquet",
    corpus_data_path="data/corpus.parquet",
    project_dir="./my_autorag_project",
)

# The project directory now contains:
#   my_autorag_project/
#       data/
#           qa.parquet
#           corpus.parquet

# Start an optimization trial
evaluator.start_trial("config/pipeline.yaml")

With Default Project Directory

from autorag.evaluator import Evaluator

# Uses the current working directory as the project directory
evaluator = Evaluator(
    qa_data_path="/absolute/path/to/qa.parquet",
    corpus_data_path="/absolute/path/to/corpus.parquet",
)

# Access loaded datasets
print(f"QA dataset rows: {len(evaluator.qa_data)}")
print(f"Corpus dataset rows: {len(evaluator.corpus_data)}")
print(f"Project directory: {evaluator.project_dir}")

Related Pages

Implements Principle

Principle:Marker_Inc_Korea_AutoRAG_Evaluator_Initialization

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment