Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FlagOpen FlagEmbedding Reinforced IR Generate Generator Data

From Leeroopedia


Knowledge Sources
Domains Information Retrieval, Data Generation, Direct Preference Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Generates DPO training data for query augmentation models in the Reinforced IR pipeline.

Description

This script generates Direct Preference Optimization (DPO) training data for fine-tuning language models to produce effective query augmentations. It uses a two-stage process: first generating multiple candidate augmentations (answers) for each query using an LLM, then using a retrieval model to rank these augmentations based on how well they retrieve the target passage.

The pipeline loads existing queries from the synthetic data directory, generates N different augmentations per query, and evaluates each augmentation's effectiveness using a retrieval model. It constructs DPO training pairs where better-performing augmentations are marked as "chosen" and worse ones as "rejected". The script supports multiple LLM types (local or API-based) and can process multiple datasets with configurable thresholds and rules.

Usage

Use this script to create training data for fine-tuning query augmentation models that help improve retrieval performance by generating informative context for queries.

Code Reference

Source Location

Signature

def main(opt):
    """Main function to generate DPO training data for query augmentation"""

def parse_option():
    """Parse command line arguments"""

Import

import argparse
import json
from FlagEmbedding import FlagModel
from agent import GPTAgent, LLMAgent, LLMInstructAgent
from utils import generate_llm_dpo_train_data
from prompts import get_additional_info_generation_prompt

I/O Contract

Inputs

Name Type Required Description
generate_model_path str Yes Path to LLM for generating augmentations
retrieval_model_name str Yes Retrieval model for evaluating augmentations
dataset_path str Yes Path to datasets directory
output_dir str Yes Directory with queries.json files
dpo_num int Yes Number of augmentation candidates per query
threshold float Yes Score threshold for DPO pair selection
temperature float No Generation temperature (default: 0.2)
max_tokens int No Max tokens per generation (default: 300)

Outputs

Name Type Description
answers.json JSON Multiple augmentation candidates per query
train.jsonl JSONL DPO training data with prompt, chosen, rejected fields

Usage Examples

# Command line usage
python generate_generator_data.py \
    --generate_model_path Meta-Llama-3-8B \
    --model_type llm_instruct \
    --retrieval_model_name BAAI/bge-large-en-v1.5 \
    --dataset_path ./data \
    --output_dir ./synthetic \
    --dpo_num 10 \
    --threshold 0.95 \
    --temperature 0.2 \
    --max_tokens 300 \
    --batch_size 1024

# Expected queries.json format:
[
    {
        "query": "What is machine learning?",
        "passage": "Machine learning is a subset of AI..."
    }
]

# Output train.jsonl format:
{
    "prompt": "Generate additional info for: What is machine learning?",
    "chosen": "ML is a technique that enables computers to learn...",
    "rejected": "Less relevant or lower-scoring augmentation..."
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment