Implementation:Datajuicer Data juicer CalibrateQAMapper Process

Knowledge Sources	Data-Juicer
Domains	NLP, Data_Quality, LLM
Last Updated	2026-02-14 17:00 GMT

Overview

Concrete tool for calibrating generated QA pairs using an LLM reviewer provided by the Data-Juicer framework.

Description

CalibrateQAMapper is a Mapper operator that sends existing QA pairs to an LLM (typically via API) for quality review and correction. It uses a configurable system prompt to instruct the LLM on calibration criteria, supports retry logic for API failures, and parses the calibrated output back into query/response fields.

Usage

Use as an operator in a pipeline after generate_qa_from_text_mapper. Typically uses an API-based model (e.g., GPT-4) for calibration.

Code Reference

Source Location

Repository: data-juicer
File: data_juicer/ops/mapper/calibrate_qa_mapper.py
Lines: L14-125

Signature

@OPERATORS.register_module('calibrate_qa_mapper')
class CalibrateQAMapper(Mapper):
    def __init__(
        self,
        api_model: str = 'gpt-4o',
        *,
        system_prompt: str = None,
        input_template: str = None,
        output_pattern: str = None,
        try_num: PositiveInt = 3,
        **kwargs
    ):
        """
        Args:
            api_model: API model name for calibration (e.g. 'gpt-4o').
            system_prompt: Instructions for the calibration LLM.
            input_template: Template for formatting QA input.
            output_pattern: Regex for parsing calibrated output.
            try_num: Number of API call retries.
        """

    def process_single(self, sample):
        """
        Calibrate a single QA pair.

        Args:
            sample: Dict with query_key and response_key.

        Returns:
            sample with calibrated query and response.
        """

Import

from data_juicer.ops.mapper.calibrate_qa_mapper import CalibrateQAMapper

I/O Contract

Inputs

Name	Type	Required	Description
api_model	str	No	API model name (default: 'gpt-4o')
sample[query_key]	str	Yes	Original question to calibrate
sample[response_key]	str	Yes	Original answer to calibrate
try_num	PositiveInt	No	API retry count (default: 3)

Outputs

Name	Type	Description
sample[query_key]	str	Calibrated question
sample[response_key]	str	Calibrated answer

Usage Examples

YAML Configuration

process:
  - generate_qa_from_text_mapper:
      hf_model: Qwen/Qwen2.5-7B-Instruct
  - calibrate_qa_mapper:
      api_model: gpt-4o
      try_num: 3

Related Pages

Implements Principle

Principle:Datajuicer_Data_juicer_QA_Calibration

Requires Environment

Environment:Datajuicer_Data_juicer_LLM_API_Credentials_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment