Principle:Marker Inc Korea AutoRAG Pipeline Query Execution

Knowledge Sources	AutoRAG Docs
Domains	RAG Pipeline Execution, Query Processing
Last Updated	2026-02-12 00:00 GMT

Overview

Pipeline query execution processes a user query through a sequential chain of RAG modules, accumulating results in a DataFrame that flows from retrieval through generation.

Description

Once a pipeline runner has been initialized with its module instances, it can accept natural language queries and produce answers. The execution model is a sequential chain: each module receives the accumulated results from all prior modules, applies its transformation, and appends its output columns to the shared DataFrame.

The pipeline begins by constructing a pseudo QA DataFrame containing the user's query along with placeholder columns (a unique query ID, empty retrieval ground truth, and empty generation ground truth). These placeholders satisfy the column contract expected by each module's pure method, which was originally designed for evaluation against ground truth data. In the deployment context, the ground truth columns are unused but structurally required.

Each module's pure method receives the previous_result DataFrame and returns a new_result DataFrame with its output columns. The runner merges these DataFrames by dropping any overlapping columns from the previous result (to allow modules to overwrite earlier values) and concatenating the new columns. This accumulation pattern means that later modules can access all intermediate results -- for example, the prompt maker can see both the original query and the retrieved passages, and the generator can see the constructed prompt.

After all modules have executed, the runner extracts the final answer from the specified result_column (defaulting to generated_texts).

Usage

Use pipeline query execution whenever you need to get an answer from a deployed AutoRAG pipeline. This is the core inference operation. It is called directly via Runner.run(), and is also used internally by ApiRunner (for API requests) and GradioRunner (for chat interactions).

Theoretical Basis

The execution follows a linear pipeline (also known as a chain of responsibility) pattern:

Input: query string Q
D_0 = DataFrame({qid: uuid, query: Q, retrieval_gt: [], generation_gt: ""})

For i = 1 to N (number of modules):
    R_i = module_i.pure(previous_result=D_{i-1}, **params_i)
    overlap = columns(D_{i-1}) intersection columns(R_i)
    D_i = concat(D_{i-1} \ overlap, R_i)

Output: D_N[result_column][0]

Key properties:

Monotonic column growth: Each module can add new columns; it can also overwrite existing columns through the overlap-drop mechanism.
Single-row execution: In deployment mode, the DataFrame always has exactly one row (one query), unlike evaluation mode which processes batches.
Module independence: Each module only depends on the DataFrame contract (expected input columns) and is agnostic to which specific module produced them.
Deterministic ordering: Modules execute in the exact order established during runner initialization, which mirrors the node line and node order from the config.

Related Pages

Implemented By

Implementation:Marker_Inc_Korea_AutoRAG_Runner_Run

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment