Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Marker Inc Korea AutoRAG Factoid Query Gen

From Leeroopedia
Knowledge Sources
Domains Natural Language Processing, Question Generation, Evaluation Methodology
Last Updated 2026-02-12 00:00 GMT

Overview

Concrete tool for generating factoid single-hop questions from passages using a large language model provided by the AutoRAG framework.

Description

The factoid_query_gen function generates a factoid question from a single passage by prompting a LlamaIndex-compatible LLM with a predefined factoid single-hop prompt template. It is an async function that operates on individual QA row dictionaries, making it compatible with the QA.batch_apply() method for efficient parallel processing.

Internally, factoid_query_gen delegates to the llama_index_generate_base helper function, which concatenates all retrieval ground truth contents into a numbered context string, appends it to the factoid prompt template (selected by language), and sends the resulting messages to the LLM via achat(). The LLM's response is stored in the query field of the row dictionary.

The module also provides additional query generation strategies including concept_completion_query_gen, two_hop_incremental, custom_query_gen, and the experimental multiple_queries_gen. All follow the same architectural pattern but use different prompt templates.

Usage

Import and use this function as the transformation function argument to QA.batch_apply(). The make_retrieval_gt_contents() method must be called on the QA instance first to populate the retrieval_gt_contents column that this function reads.

Code Reference

Source Location

  • Repository: AutoRAG
  • File: autorag/data/qa/query/llama_gen_query.py (lines 25-32)

Signature

async def factoid_query_gen(
    row: Dict,
    llm: BaseLLM,
    lang: str = "en",
) -> Dict:
    return await llama_index_generate_base(
        row, llm, QUERY_GEN_PROMPT["factoid_single_hop"][lang]
    )

Import

from autorag.data.qa.query.llama_gen_query import factoid_query_gen

I/O Contract

Inputs

Name Type Required Description
row Dict yes A dictionary representing a single QA row. Must contain the key retrieval_gt_contents (List[List[str]]) with the passage texts.
llm BaseLLM yes A LlamaIndex BaseLLM instance (e.g., OpenAI, Anthropic). Used for async chat completion via achat().
lang str no Language code for the prompt template. Supported values: "en", "ko", "ja". Defaults to "en".

Outputs

Name Type Description
row Dict The input row dictionary with an added query key containing the generated factoid question (str).

Usage Examples

Basic Usage

from autorag.data.qa.schema import Raw
from autorag.data.qa.sample import random_single_hop
from autorag.data.qa.query.llama_gen_query import factoid_query_gen
from llama_index.llms.openai import OpenAI

# Set up LLM
llm = OpenAI(model="gpt-4o-mini")

# Build pipeline up to query generation
corpus = Raw(parsed_df).chunk("token", chunk_size=512)
qa = (corpus
      .sample(random_single_hop, n=100)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, llm=llm, lang="en"))

# qa.data now has columns: qid, retrieval_gt, retrieval_gt_contents, query
print(qa.data[["qid", "query"]].head())

Korean Language Queries

from autorag.data.qa.query.llama_gen_query import factoid_query_gen

qa = (corpus
      .sample(random_single_hop, n=50)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, llm=llm, lang="ko"))

With Custom Batch Size

# Process in smaller batches to stay within API rate limits
qa = (corpus
      .sample(random_single_hop, n=500)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, batch_size=16, llm=llm, lang="en"))

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment