Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Marker Inc Korea AutoRAG QA Batch Apply Factoid Query Gen

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Generation
Last Updated 2026-02-08 06:00 GMT

Overview

Concrete tool for generating factoid questions from passages using LLMs, provided by AutoRAG's QA schema and query modules.

Description

QA.batch_apply is the async batch execution method that applies a generation function to each row of the QA DataFrame. For query generation, it is used with factoid_query_gen (available in both OpenAI and LlamaIndex variants). The OpenAI variant uses structured output parsing via Pydantic models, while the LlamaIndex variant uses LlamaIndex's chat interface. Both use language-specific prompt templates.

Usage

Use QA.batch_apply with factoid_query_gen after calling QA.make_retrieval_gt_contents() to populate passage contents. Choose the OpenAI variant when using GPT models, or the LlamaIndex variant for other LLM providers.

Code Reference

Source Location

  • Repository: AutoRAG
  • File: autorag/data/qa/schema.py (QA.batch_apply), autorag/data/qa/query/openai_gen_query.py (factoid_query_gen OpenAI), autorag/data/qa/query/llama_gen_query.py (factoid_query_gen LlamaIndex)
  • Lines: schema.py L134-146, openai_gen_query.py L39-47, llama_gen_query.py L25-32

Signature

# QA.batch_apply (schema.py)
def batch_apply(
    self,
    fn: Callable[[Dict, Any], Awaitable[Dict]],
    batch_size: int = 32,
    **kwargs
) -> "QA":
    """
    Apply an async function to each row in batches.

    Args:
        fn: Async function that takes a row dict and returns modified row dict.
        batch_size: Number of concurrent tasks (default 32).
        **kwargs: Additional args passed to fn.
    """

# OpenAI variant (openai_gen_query.py)
async def factoid_query_gen(
    row: Dict,
    client: AsyncClient,
    model_name: str = "gpt-4o-2024-08-06",
    lang: str = "en",
) -> Dict:
    """Generate a factoid question using OpenAI structured output."""

# LlamaIndex variant (llama_gen_query.py)
async def factoid_query_gen(
    row: Dict,
    llm: BaseLLM,
    lang: str = "en",
) -> Dict:
    """Generate a factoid question using LlamaIndex LLM."""

Import

from autorag.data.qa.schema import QA
from autorag.data.qa.query.openai_gen_query import factoid_query_gen  # OpenAI
# OR
from autorag.data.qa.query.llama_gen_query import factoid_query_gen   # LlamaIndex

I/O Contract

Inputs

Name Type Required Description
QA instance QA Yes Must have retrieval_gt_contents column (call make_retrieval_gt_contents() first)
client AsyncClient Yes (OpenAI) OpenAI async client
llm BaseLLM Yes (LlamaIndex) LlamaIndex LLM instance
model_name str No Model name (default: gpt-4o-2024-08-06)
lang str No Language code: en, ko, or ja (default: en)
batch_size int No Concurrent batch size (default: 32)

Outputs

Name Type Description
QA instance QA Original QA with added "query" column containing generated questions

Usage Examples

Generate Factoid Queries with OpenAI

from openai import AsyncClient
from autorag.data.qa.query.openai_gen_query import factoid_query_gen

client = AsyncClient()

# qa must have retrieval_gt_contents (call make_retrieval_gt_contents() first)
qa = qa.make_retrieval_gt_contents()

# Generate factoid questions
qa = qa.batch_apply(
    factoid_query_gen,
    client=client,
    model_name="gpt-4o-2024-08-06",
    lang="en",
    batch_size=32,
)

print(qa.data["query"].head())

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment