Implementation:Marker Inc Korea AutoRAG Factoid Query Gen

Knowledge Sources	AutoRAG
Domains	Natural Language Processing, Question Generation, Evaluation Methodology
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for generating factoid single-hop questions from passages using a large language model provided by the AutoRAG framework.

Description

The factoid_query_gen function generates a factoid question from a single passage by prompting a LlamaIndex-compatible LLM with a predefined factoid single-hop prompt template. It is an async function that operates on individual QA row dictionaries, making it compatible with the QA.batch_apply() method for efficient parallel processing.

Internally, factoid_query_gen delegates to the llama_index_generate_base helper function, which concatenates all retrieval ground truth contents into a numbered context string, appends it to the factoid prompt template (selected by language), and sends the resulting messages to the LLM via achat(). The LLM's response is stored in the query field of the row dictionary.

The module also provides additional query generation strategies including concept_completion_query_gen, two_hop_incremental, custom_query_gen, and the experimental multiple_queries_gen. All follow the same architectural pattern but use different prompt templates.

Usage

Import and use this function as the transformation function argument to QA.batch_apply(). The make_retrieval_gt_contents() method must be called on the QA instance first to populate the retrieval_gt_contents column that this function reads.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/data/qa/query/llama_gen_query.py (lines 25-32)

Signature

async def factoid_query_gen(
    row: Dict,
    llm: BaseLLM,
    lang: str = "en",
) -> Dict:
    return await llama_index_generate_base(
        row, llm, QUERY_GEN_PROMPT["factoid_single_hop"][lang]
    )

Import

from autorag.data.qa.query.llama_gen_query import factoid_query_gen

I/O Contract

Inputs

Name	Type	Required	Description
row	Dict	yes	A dictionary representing a single QA row. Must contain the key retrieval_gt_contents (List[List[str]]) with the passage texts.
llm	BaseLLM	yes	A LlamaIndex BaseLLM instance (e.g., OpenAI, Anthropic). Used for async chat completion via achat().
lang	str	no	Language code for the prompt template. Supported values: "en", "ko", "ja". Defaults to "en".

Outputs

Name	Type	Description
row	Dict	The input row dictionary with an added query key containing the generated factoid question (str).

Usage Examples

Basic Usage

from autorag.data.qa.schema import Raw
from autorag.data.qa.sample import random_single_hop
from autorag.data.qa.query.llama_gen_query import factoid_query_gen
from llama_index.llms.openai import OpenAI

# Set up LLM
llm = OpenAI(model="gpt-4o-mini")

# Build pipeline up to query generation
corpus = Raw(parsed_df).chunk("token", chunk_size=512)
qa = (corpus
      .sample(random_single_hop, n=100)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, llm=llm, lang="en"))

# qa.data now has columns: qid, retrieval_gt, retrieval_gt_contents, query
print(qa.data[["qid", "query"]].head())

Korean Language Queries

from autorag.data.qa.query.llama_gen_query import factoid_query_gen

qa = (corpus
      .sample(random_single_hop, n=50)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, llm=llm, lang="ko"))

With Custom Batch Size

# Process in smaller batches to stay within API rate limits
qa = (corpus
      .sample(random_single_hop, n=500)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, batch_size=16, llm=llm, lang="en"))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment