Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer CalibrateQueryMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for calibrating the query (question) in question-answer pairs based on reference text provided by Data-Juicer.

Description

CalibrateQueryMapper is a mapper operator that calibrates only the query portion of a QA pair using a reference text and an API-based language model. It extends CalibrateQAMapper with a specialized Chinese system prompt that instructs the model to refine only the question, making it more detailed and accurate while ensuring the original answer still applies. The parse_output method returns the stripped raw output as the calibrated query and None for the answer, leaving the original answer unchanged. It extends the Mapper base class (via CalibrateQAMapper).

Usage

Import when you need to refine questions in QA datasets while preserving the original answers.

Code Reference

Source Location

Signature

@OPERATORS.register_module("calibrate_query_mapper")
class CalibrateQueryMapper(CalibrateQAMapper):
    def __init__(self,
                 api_model: str = "gpt-4o",
                 *,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 reference_template: Optional[str] = None,
                 qa_pair_template: Optional[str] = None,
                 output_pattern: Optional[str] = None,
                 try_num: PositiveInt = 3,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 **kwargs):

Import

from data_juicer.ops.mapper.calibrate_query_mapper import CalibrateQueryMapper

I/O Contract

Inputs

Name Type Required Description
api_model str No API model name. Default: "gpt-4o"
api_endpoint Optional[str] No URL endpoint for the API
response_path Optional[str] No Path to extract content from the API response. Defaults to 'choices.0.message.content'
system_prompt Optional[str] No System prompt for the calibration task
input_template Optional[str] No Template for building the model input
reference_template Optional[str] No Template for formatting the reference text
qa_pair_template Optional[str] No Template for formatting question-answer pairs
output_pattern Optional[str] No Regular expression for parsing model output
try_num PositiveInt No Number of retry attempts on API call or parsing error. Default: 3
model_params Dict No Parameters for initializing the API model
sampling_params Dict No Extra parameters passed to the API call (e.g. temperature, top_p)

Outputs

Name Type Description
samples Dict Transformed samples with calibrated query field updated

Usage Examples

YAML Configuration

process:
  - calibrate_query_mapper:
      api_model: gpt-4o
      try_num: 3

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment