Implementation:Datajuicer Data juicer CalibrateQueryMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for calibrating the query (question) in question-answer pairs based on reference text provided by Data-Juicer.
Description
CalibrateQueryMapper is a mapper operator that calibrates only the query portion of a QA pair using a reference text and an API-based language model. It extends CalibrateQAMapper with a specialized Chinese system prompt that instructs the model to refine only the question, making it more detailed and accurate while ensuring the original answer still applies. The parse_output method returns the stripped raw output as the calibrated query and None for the answer, leaving the original answer unchanged. It extends the Mapper base class (via CalibrateQAMapper).
Usage
Import when you need to refine questions in QA datasets while preserving the original answers.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/calibrate_query_mapper.py
Signature
@OPERATORS.register_module("calibrate_query_mapper")
class CalibrateQueryMapper(CalibrateQAMapper):
def __init__(self,
api_model: str = "gpt-4o",
*,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
input_template: Optional[str] = None,
reference_template: Optional[str] = None,
qa_pair_template: Optional[str] = None,
output_pattern: Optional[str] = None,
try_num: PositiveInt = 3,
model_params: Dict = {},
sampling_params: Dict = {},
**kwargs):
Import
from data_juicer.ops.mapper.calibrate_query_mapper import CalibrateQueryMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name. Default: "gpt-4o" |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from the API response. Defaults to 'choices.0.message.content' |
| system_prompt | Optional[str] | No | System prompt for the calibration task |
| input_template | Optional[str] | No | Template for building the model input |
| reference_template | Optional[str] | No | Template for formatting the reference text |
| qa_pair_template | Optional[str] | No | Template for formatting question-answer pairs |
| output_pattern | Optional[str] | No | Regular expression for parsing model output |
| try_num | PositiveInt | No | Number of retry attempts on API call or parsing error. Default: 3 |
| model_params | Dict | No | Parameters for initializing the API model |
| sampling_params | Dict | No | Extra parameters passed to the API call (e.g. temperature, top_p) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with calibrated query field updated |
Usage Examples
YAML Configuration
process:
- calibrate_query_mapper:
api_model: gpt-4o
try_num: 3