Implementation:Datajuicer Data juicer CalibrateQueryMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for calibrating the query (question) in question-answer pairs based on reference text provided by Data-Juicer.

Description

CalibrateQueryMapper is a mapper operator that calibrates only the query portion of a QA pair using a reference text and an API-based language model. It extends CalibrateQAMapper with a specialized Chinese system prompt that instructs the model to refine only the question, making it more detailed and accurate while ensuring the original answer still applies. The parse_output method returns the stripped raw output as the calibrated query and None for the answer, leaving the original answer unchanged. It extends the Mapper base class (via CalibrateQAMapper).

Usage

Import when you need to refine questions in QA datasets while preserving the original answers.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/calibrate_query_mapper.py

Signature

@OPERATORS.register_module("calibrate_query_mapper")
class CalibrateQueryMapper(CalibrateQAMapper):
    def __init__(self,
                 api_model: str = "gpt-4o",
                 *,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 reference_template: Optional[str] = None,
                 qa_pair_template: Optional[str] = None,
                 output_pattern: Optional[str] = None,
                 try_num: PositiveInt = 3,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 **kwargs):

Import

from data_juicer.ops.mapper.calibrate_query_mapper import CalibrateQueryMapper

I/O Contract

Inputs

Name	Type	Required	Description
api_model	str	No	API model name. Default: "gpt-4o"
api_endpoint	Optional[str]	No	URL endpoint for the API
response_path	Optional[str]	No	Path to extract content from the API response. Defaults to 'choices.0.message.content'
system_prompt	Optional[str]	No	System prompt for the calibration task
input_template	Optional[str]	No	Template for building the model input
reference_template	Optional[str]	No	Template for formatting the reference text
qa_pair_template	Optional[str]	No	Template for formatting question-answer pairs
output_pattern	Optional[str]	No	Regular expression for parsing model output
try_num	PositiveInt	No	Number of retry attempts on API call or parsing error. Default: 3
model_params	Dict	No	Parameters for initializing the API model
sampling_params	Dict	No	Extra parameters passed to the API call (e.g. temperature, top_p)

Outputs

Name	Type	Description
samples	Dict	Transformed samples with calibrated query field updated

Usage Examples

YAML Configuration

process:
  - calibrate_query_mapper:
      api_model: gpt-4o
      try_num: 3

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment