Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Datajuicer Data juicer CalibrateQAMapper Process

From Leeroopedia
Knowledge Sources
Domains NLP, Data_Quality, LLM
Last Updated 2026-02-14 17:00 GMT

Overview

Concrete tool for calibrating generated QA pairs using an LLM reviewer provided by the Data-Juicer framework.

Description

CalibrateQAMapper is a Mapper operator that sends existing QA pairs to an LLM (typically via API) for quality review and correction. It uses a configurable system prompt to instruct the LLM on calibration criteria, supports retry logic for API failures, and parses the calibrated output back into query/response fields.

Usage

Use as an operator in a pipeline after generate_qa_from_text_mapper. Typically uses an API-based model (e.g., GPT-4) for calibration.

Code Reference

Source Location

  • Repository: data-juicer
  • File: data_juicer/ops/mapper/calibrate_qa_mapper.py
  • Lines: L14-125

Signature

@OPERATORS.register_module('calibrate_qa_mapper')
class CalibrateQAMapper(Mapper):
    def __init__(
        self,
        api_model: str = 'gpt-4o',
        *,
        system_prompt: str = None,
        input_template: str = None,
        output_pattern: str = None,
        try_num: PositiveInt = 3,
        **kwargs
    ):
        """
        Args:
            api_model: API model name for calibration (e.g. 'gpt-4o').
            system_prompt: Instructions for the calibration LLM.
            input_template: Template for formatting QA input.
            output_pattern: Regex for parsing calibrated output.
            try_num: Number of API call retries.
        """

    def process_single(self, sample):
        """
        Calibrate a single QA pair.

        Args:
            sample: Dict with query_key and response_key.

        Returns:
            sample with calibrated query and response.
        """

Import

from data_juicer.ops.mapper.calibrate_qa_mapper import CalibrateQAMapper

I/O Contract

Inputs

Name Type Required Description
api_model str No API model name (default: 'gpt-4o')
sample[query_key] str Yes Original question to calibrate
sample[response_key] str Yes Original answer to calibrate
try_num PositiveInt No API retry count (default: 3)

Outputs

Name Type Description
sample[query_key] str Calibrated question
sample[response_key] str Calibrated answer

Usage Examples

YAML Configuration

process:
  - generate_qa_from_text_mapper:
      hf_model: Qwen/Qwen2.5-7B-Instruct
  - calibrate_qa_mapper:
      api_model: gpt-4o
      try_num: 3

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment