Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ExtractSupportTextMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for extracting supporting sub-text from original text based on a given summary provided by Data-Juicer.

Description

ExtractSupportTextMapper is a mapper operator that uses an API-based language model to identify and extract the segment of original text that best matches a provided summary. It sends the original text and a summary (from the event_description metadata key by default) to the model with a Chinese system prompt demonstrating the excerpt extraction task. If extraction fails or returns empty, the original summary is used as a fallback. Results are stored under the support_text metadata key.

Usage

Use when you need to link summaries or event descriptions back to their source text, providing evidence-based support for extracted information and enabling traceability in text analysis pipelines.

Code Reference

Source Location

Signature

@OPERATORS.register_module("extract_support_text_mapper")
class ExtractSupportTextMapper(Mapper):
    def __init__(self,
                 api_model: str = "gpt-4o",
                 *,
                 summary_key: str = MetaKeys.event_description,
                 support_text_key: str = MetaKeys.support_text,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 try_num: PositiveInt = 3,
                 drop_text: bool = False,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 **kwargs):

Import

from data_juicer.ops.mapper.extract_support_text_mapper import ExtractSupportTextMapper

I/O Contract

Inputs

Name Type Required Description
api_model str No API model name, defaults to "gpt-4o"
summary_key str No Key name for input summary in meta field, defaults to MetaKeys.event_description
support_text_key str No Key name to store output support text in meta field, defaults to MetaKeys.support_text
api_endpoint Optional[str] No URL endpoint for the API
response_path Optional[str] No Path to extract content from API response
system_prompt Optional[str] No System prompt for the task
input_template Optional[str] No Template for building the model input
try_num PositiveInt No Number of retry attempts on error, defaults to 3
drop_text bool No Whether to drop text from output, defaults to False
model_params Dict No Parameters for initializing the API model
sampling_params Dict No Extra parameters passed to API call

Outputs

Name Type Description
samples Dict Transformed samples with support text stored in meta field

Usage Examples

process:
  - extract_support_text_mapper:
      api_model: "gpt-4o"
      try_num: 3
      drop_text: false

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment