Implementation:Datajuicer Data juicer ExtractEventMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for extracting events and relevant characters from narrative text provided by Data-Juicer.
Description
ExtractEventMapper is a mapper operator that extracts events (plot points) and their relevant characters from text using an API-based language model (default: GPT-4o). It sends text to the API model with a Chinese system prompt that instructs it to summarize the text into numbered plot events, each with a description and a list of relevant characters. The structured markdown response is parsed using a regex pattern to extract event numbers, descriptions, and character lists. Results are stored in metadata under event_description and relevant_characters keys. Supports optional text dropping and retry logic. It operates in batched mode. It extends the Mapper base class.
Usage
Import when you need to extract structured event-character information from stories and texts for building event timelines or character interaction graphs.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/extract_event_mapper.py
Signature
@OPERATORS.register_module("extract_event_mapper")
class ExtractEventMapper(Mapper):
def __init__(self,
api_model: str = "gpt-4o",
*,
event_desc_key: str = MetaKeys.event_description,
relevant_char_key: str = MetaKeys.relevant_characters,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
input_template: Optional[str] = None,
output_pattern: Optional[str] = None,
try_num: PositiveInt = 3,
drop_text: bool = False,
model_params: Dict = {},
sampling_params: Dict = {},
**kwargs):
Import
from data_juicer.ops.mapper.extract_event_mapper import ExtractEventMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name. Default: "gpt-4o" |
| event_desc_key | str | No | Key name in meta field to store event descriptions. Default: "event_description" |
| relevant_char_key | str | No | Key name in meta field to store relevant characters. Default: "relevant_characters" |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from the API response |
| system_prompt | Optional[str] | No | System prompt for the task |
| input_template | Optional[str] | No | Template for building the model input |
| output_pattern | Optional[str] | No | Regular expression pattern for parsing model output |
| try_num | PositiveInt | No | Number of retry attempts on API call error. Default: 3 |
| drop_text | bool | No | Whether to drop the original text after processing. Default: False |
| model_params | Dict | No | Parameters for initializing the API model |
| sampling_params | Dict | No | Extra parameters passed to the API call (e.g. temperature, top_p) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with event_description and relevant_characters added to metadata, with one output row per extracted event |
Usage Examples
YAML Configuration
process:
- extract_event_mapper:
api_model: gpt-4o
try_num: 3
drop_text: false