Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ExtractEventMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for extracting events and relevant characters from narrative text provided by Data-Juicer.

Description

ExtractEventMapper is a mapper operator that extracts events (plot points) and their relevant characters from text using an API-based language model (default: GPT-4o). It sends text to the API model with a Chinese system prompt that instructs it to summarize the text into numbered plot events, each with a description and a list of relevant characters. The structured markdown response is parsed using a regex pattern to extract event numbers, descriptions, and character lists. Results are stored in metadata under event_description and relevant_characters keys. Supports optional text dropping and retry logic. It operates in batched mode. It extends the Mapper base class.

Usage

Import when you need to extract structured event-character information from stories and texts for building event timelines or character interaction graphs.

Code Reference

Source Location

Signature

@OPERATORS.register_module("extract_event_mapper")
class ExtractEventMapper(Mapper):
    def __init__(self,
                 api_model: str = "gpt-4o",
                 *,
                 event_desc_key: str = MetaKeys.event_description,
                 relevant_char_key: str = MetaKeys.relevant_characters,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 output_pattern: Optional[str] = None,
                 try_num: PositiveInt = 3,
                 drop_text: bool = False,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 **kwargs):

Import

from data_juicer.ops.mapper.extract_event_mapper import ExtractEventMapper

I/O Contract

Inputs

Name Type Required Description
api_model str No API model name. Default: "gpt-4o"
event_desc_key str No Key name in meta field to store event descriptions. Default: "event_description"
relevant_char_key str No Key name in meta field to store relevant characters. Default: "relevant_characters"
api_endpoint Optional[str] No URL endpoint for the API
response_path Optional[str] No Path to extract content from the API response
system_prompt Optional[str] No System prompt for the task
input_template Optional[str] No Template for building the model input
output_pattern Optional[str] No Regular expression pattern for parsing model output
try_num PositiveInt No Number of retry attempts on API call error. Default: 3
drop_text bool No Whether to drop the original text after processing. Default: False
model_params Dict No Parameters for initializing the API model
sampling_params Dict No Extra parameters passed to the API call (e.g. temperature, top_p)

Outputs

Name Type Description
samples Dict Transformed samples with event_description and relevant_characters added to metadata, with one output row per extracted event

Usage Examples

YAML Configuration

process:
  - extract_event_mapper:
      api_model: gpt-4o
      try_num: 3
      drop_text: false

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment