Implementation:Marker Inc Korea AutoRAG Evaluator Init
| Knowledge Sources | |
|---|---|
| Domains | Pipeline Orchestration, RAG Pipeline Optimization |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
Concrete tool for initializing the optimization trial environment provided by the AutoRAG framework.
Description
The Evaluator.__init__ method bootstraps the entire optimization workspace. It validates that both the QA and corpus dataset paths exist and point to parquet files, loads them into pandas DataFrames using the PyArrow engine, and applies schema casting via cast_qa_dataset and cast_corpus_dataset to ensure column types are consistent. It then creates the project directory (defaulting to the current working directory if none is specified) and a data/ subdirectory within it. The QA and corpus datasets are copied into the project as data/qa.parquet and data/corpus.parquet respectively, using idempotent writes that skip copying if the files already exist. Finally, validate_qa_from_corpus_dataset is called to verify that every document ID in the QA ground truth exists in the corpus, catching data integrity issues before any trial begins.
Usage
Import and instantiate Evaluator whenever you need to run an optimization trial. The constructor must be called before start_trial or restart_trial. It is also used internally by the Validator class to create temporary evaluation environments for configuration validation.
Code Reference
Source Location
- Repository: AutoRAG
- File: autorag/evaluator.py (lines 55-104)
Signature
class Evaluator:
def __init__(
self,
qa_data_path: str,
corpus_data_path: str,
project_dir: Optional[str] = None,
):
"""
Initialize an Evaluator object.
:param qa_data_path: The path to the QA dataset. Must be parquet file.
:param corpus_data_path: The path to the corpus dataset. Must be parquet file.
:param project_dir: The path to the project directory. Default is the current directory.
"""
Import
from autorag.evaluator import Evaluator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| qa_data_path | str | yes | Path to the QA dataset in parquet format. Must exist and have a .parquet extension. The dataset must contain columns including qid, query, retrieval_gt, and generation_gt. |
| corpus_data_path | str | yes | Path to the corpus dataset in parquet format. Must exist and have a .parquet extension. The dataset must contain columns including doc_id and contents. |
| project_dir | Optional[str] | no | Path to the project directory where trial data and results will be stored. Defaults to the current working directory. Created automatically if it does not exist. |
Outputs
| Name | Type | Description |
|---|---|---|
| evaluator | Evaluator | An initialized Evaluator instance with loaded datasets (self.qa_data, self.corpus_data), path references (self.qa_data_path, self.corpus_data_path, self.project_dir), and a prepared project directory structure containing data/qa.parquet and data/corpus.parquet. |
Usage Examples
Basic Usage
from autorag.evaluator import Evaluator
# Initialize the evaluator with dataset paths and a project directory
evaluator = Evaluator(
qa_data_path="data/qa.parquet",
corpus_data_path="data/corpus.parquet",
project_dir="./my_autorag_project",
)
# The project directory now contains:
# my_autorag_project/
# data/
# qa.parquet
# corpus.parquet
# Start an optimization trial
evaluator.start_trial("config/pipeline.yaml")
With Default Project Directory
from autorag.evaluator import Evaluator
# Uses the current working directory as the project directory
evaluator = Evaluator(
qa_data_path="/absolute/path/to/qa.parquet",
corpus_data_path="/absolute/path/to/corpus.parquet",
)
# Access loaded datasets
print(f"QA dataset rows: {len(evaluator.qa_data)}")
print(f"Corpus dataset rows: {len(evaluator.corpus_data)}")
print(f"Project directory: {evaluator.project_dir}")
Related Pages
Implements Principle
Requires Environment
- Environment:Marker_Inc_Korea_AutoRAG_Python_3_10_Runtime
- Environment:Marker_Inc_Korea_AutoRAG_GPU_PyTorch_Environment