Implementation:Marker Inc Korea AutoRAG Run Node Line

Knowledge Sources	AutoRAG
Domains	Pipeline Orchestration, RAG Pipeline Optimization
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for executing a sequence of pipeline nodes and selecting the best module at each step, provided by the AutoRAG framework.

Description

The run_node_line function is the core execution engine of AutoRAG's optimization loop. It takes an ordered list of Node objects and processes them sequentially, passing the output of each node's best module as input to the next node. If no previous result is provided, it loads the QA dataset from the project's data/qa.parquet file as the initial input.

For each node in the sequence, the function calls node.run(), which internally evaluates all configured module candidates, computes metrics, applies the selection strategy, and saves results. After each node completes, the function reads the node's summary.csv to extract the best module's metadata (filename, module name, parameters, and execution time) and appends it to a running summary list. Once all nodes have been processed, a node-line-level summary.csv is written to the node line directory, aggregating the best module selections from every node.

Usage

Import and call run_node_line when you need to execute a complete node line within an optimization trial. This function is called by Evaluator.start_trial for each node line defined in the YAML configuration, and by Evaluator.restart_trial when resuming from a partially completed trial. It can also be called directly for programmatic pipeline evaluation.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/node_line.py (lines 24-65)

Signature

def run_node_line(
    nodes: List[Node],
    node_line_dir: str,
    previous_result: Optional[pd.DataFrame] = None,
):
    """
    Run the whole node line by running each node.

    :param nodes: A list of nodes.
    :param node_line_dir: This node line's directory.
    :param previous_result: A result of the previous node line.
        If None, it loads qa data from data/qa.parquet.
    :return: The final result of the node line.
    """

Import

from autorag.node_line import run_node_line

I/O Contract

Inputs

Name	Type	Required	Description
nodes	List[Node]	yes	Ordered list of Node objects representing the pipeline stages to execute. Each Node contains the node type, module candidates, strategy, and metrics.
node_line_dir	str	yes	Path to the directory where this node line's results will be stored. Subdirectories are created for each node type (e.g., node_line_dir/retrieval/, node_line_dir/generation/).
previous_result	Optional[pd.DataFrame]	no	The output DataFrame from a previous node line, used as input to the first node. If None, the QA dataset is loaded from project_dir/data/qa.parquet.

Outputs

Name	Type	Description
result	pd.DataFrame	The output DataFrame from the best module of the last node in the line. This becomes the input to the next node line if one exists.
summary.csv	File (side effect)	A CSV file written to node_line_dir/summary.csv containing the best module selection for each node, with columns: node_type, best_module_filename, best_module_name, best_module_params, best_execution_time.

Usage Examples

Basic Usage

import pandas as pd
from autorag.schema import Node
from autorag.node_line import run_node_line

# Construct nodes from a YAML configuration dictionary
node_dicts = [
    {"node_type": "retrieval", "strategy": {"metrics": ["retrieval_f1"]}, "modules": [...]},
    {"node_type": "generation", "strategy": {"metrics": ["bleu"]}, "modules": [...]},
]
nodes = [Node.from_dict(d) for d in node_dicts]

# Load initial QA data
qa_data = pd.read_parquet("my_project/data/qa.parquet", engine="pyarrow")

# Run the node line
final_result = run_node_line(
    nodes=nodes,
    node_line_dir="my_project/0/pre_retrieve_node_line",
    previous_result=qa_data,
)

print(f"Final result columns: {list(final_result.columns)}")

Chaining Node Lines

from autorag.node_line import run_node_line

# Run first node line
result_1 = run_node_line(
    nodes=retrieval_nodes,
    node_line_dir="my_project/0/retrieve_node_line",
    previous_result=qa_data,
)

# Pass the output to the second node line
result_2 = run_node_line(
    nodes=generation_nodes,
    node_line_dir="my_project/0/post_retrieve_node_line",
    previous_result=result_1,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment