Implementation:Datajuicer Data juicer CalibrateQAMapper Process
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Data_Quality, LLM |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
Concrete tool for calibrating generated QA pairs using an LLM reviewer provided by the Data-Juicer framework.
Description
CalibrateQAMapper is a Mapper operator that sends existing QA pairs to an LLM (typically via API) for quality review and correction. It uses a configurable system prompt to instruct the LLM on calibration criteria, supports retry logic for API failures, and parses the calibrated output back into query/response fields.
Usage
Use as an operator in a pipeline after generate_qa_from_text_mapper. Typically uses an API-based model (e.g., GPT-4) for calibration.
Code Reference
Source Location
- Repository: data-juicer
- File: data_juicer/ops/mapper/calibrate_qa_mapper.py
- Lines: L14-125
Signature
@OPERATORS.register_module('calibrate_qa_mapper')
class CalibrateQAMapper(Mapper):
def __init__(
self,
api_model: str = 'gpt-4o',
*,
system_prompt: str = None,
input_template: str = None,
output_pattern: str = None,
try_num: PositiveInt = 3,
**kwargs
):
"""
Args:
api_model: API model name for calibration (e.g. 'gpt-4o').
system_prompt: Instructions for the calibration LLM.
input_template: Template for formatting QA input.
output_pattern: Regex for parsing calibrated output.
try_num: Number of API call retries.
"""
def process_single(self, sample):
"""
Calibrate a single QA pair.
Args:
sample: Dict with query_key and response_key.
Returns:
sample with calibrated query and response.
"""
Import
from data_juicer.ops.mapper.calibrate_qa_mapper import CalibrateQAMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name (default: 'gpt-4o') |
| sample[query_key] | str | Yes | Original question to calibrate |
| sample[response_key] | str | Yes | Original answer to calibrate |
| try_num | PositiveInt | No | API retry count (default: 3) |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[query_key] | str | Calibrated question |
| sample[response_key] | str | Calibrated answer |
Usage Examples
YAML Configuration
process:
- generate_qa_from_text_mapper:
hf_model: Qwen/Qwen2.5-7B-Instruct
- calibrate_qa_mapper:
api_model: gpt-4o
try_num: 3
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment