Implementation:Marker Inc Korea AutoRAG Make Basic Gen Gt

Knowledge Sources	AutoRAG
Domains	Natural Language Processing, Question Answering, Evaluation Methodology
Last Updated	2026-02-12 00:00 GMT

Overview

Concrete tool for generating detailed ground-truth answers from passage-question pairs using a large language model provided by the AutoRAG framework.

Description

The make_basic_gen_gt function generates a thorough, well-explained answer for a given question-passage pair by prompting a LlamaIndex-compatible LLM with the "basic" system prompt template. It is an async function that operates on individual QA row dictionaries, compatible with QA.batch_apply() for parallel processing.

Internally, make_basic_gen_gt delegates to make_gen_gt_llama_index, which flattens the retrieval ground truth contents into a single passage string, formats a user prompt containing both the passage and the question, and sends it to the LLM via achat() with temperature=0.0 for deterministic output. The response is appended to the row's generation_gt list via the add_gen_gt helper function, which handles the accumulation pattern (creating a new list if generation_gt does not yet exist, or appending to the existing list).

The module also provides make_concise_gen_gt (for short phrase answers), and make_custom_gen_gt (for user-defined system prompts). All follow the same architectural pattern but use different system prompts.

Usage

Import and use this function as the transformation function argument to QA.batch_apply() after query generation has been performed. The row must contain both query and retrieval_gt_contents fields.

Code Reference

Source Location

Repository: AutoRAG
File: autorag/data/qa/generation_gt/llama_index_gen_gt.py (lines 36-37)
Supporting file: autorag/data/qa/generation_gt/base.py (add_gen_gt helper)

Signature

async def make_basic_gen_gt(row: Dict, llm: BaseLLM, lang: str = "en") -> Dict:
    return await make_gen_gt_llama_index(row, llm, GEN_GT_SYSTEM_PROMPT["basic"][lang])

Import

from autorag.data.qa.generation_gt.llama_index_gen_gt import make_basic_gen_gt

I/O Contract

Inputs

Name	Type	Required	Description
row	Dict	yes	A dictionary representing a single QA row. Must contain query (str) and retrieval_gt_contents (List[List[str]]) keys.
llm	BaseLLM	yes	A LlamaIndex BaseLLM instance (e.g., OpenAI, Anthropic). Used for async chat completion via achat() with temperature=0.0.
lang	str	no	Language code for the system prompt. Supported values: "en", "ko", "ja". Defaults to "en".

Outputs

Name	Type	Description
row	Dict	The input row dictionary with an added or updated generation_gt key containing a list of answer strings (List[str]). Each call appends one answer to the list.

Usage Examples

Basic Usage

from autorag.data.qa.schema import Raw
from autorag.data.qa.sample import random_single_hop
from autorag.data.qa.query.llama_gen_query import factoid_query_gen
from autorag.data.qa.generation_gt.llama_index_gen_gt import make_basic_gen_gt
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini")

qa = (Raw(parsed_df)
      .chunk("token", chunk_size=512)
      .sample(random_single_hop, n=100)
      .make_retrieval_gt_contents()
      .batch_apply(factoid_query_gen, llm=llm, lang="en")
      .batch_apply(make_basic_gen_gt, llm=llm, lang="en"))

# qa.data now has columns including: qid, query, retrieval_gt, generation_gt
print(qa.data[["query", "generation_gt"]].head())

Multiple Reference Answers

from autorag.data.qa.generation_gt.llama_index_gen_gt import (
    make_basic_gen_gt,
    make_concise_gen_gt,
)

# Generate both a detailed and a concise answer for each QA pair
qa = (qa_with_queries
      .batch_apply(make_basic_gen_gt, llm=llm, lang="en")
      .batch_apply(make_concise_gen_gt, llm=llm, lang="en"))

# Each row's generation_gt now contains two answers: [detailed_answer, concise_answer]

With Custom Batch Size

# Use smaller batch size for rate-limited APIs
qa = qa_with_queries.batch_apply(
    make_basic_gen_gt, batch_size=8, llm=llm, lang="en"
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment