Implementation:Datajuicer Data juicer ExtractNicknameMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for extracting nickname relationships between characters in text provided by Data-Juicer.
Description
ExtractNicknameMapper is a mapper operator that identifies and extracts nickname relationships from input text using an API-based language model. It sends text to the model with a Chinese system prompt instructing it to identify speaker, addressee, and nickname triples. The structured markdown response is parsed via a verbose regex pattern with cross-validation checks. Results are stored in metadata under the configured nickname key as relationship records containing source entity, target entity, relation description, and relation keywords.
Usage
Use when you need to build character relationship graphs from narrative text, extracting interpersonal nickname and address-form data to understand social dynamics in story-based datasets.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/extract_nickname_mapper.py
Signature
@OPERATORS.register_module("extract_nickname_mapper")
class ExtractNicknameMapper(Mapper):
def __init__(self,
api_model: str = "gpt-4o",
*,
nickname_key: str = MetaKeys.nickname,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
input_template: Optional[str] = None,
output_pattern: Optional[str] = None,
try_num: PositiveInt = 3,
drop_text: bool = False,
model_params: Dict = {},
sampling_params: Dict = {},
**kwargs):
Import
from data_juicer.ops.mapper.extract_nickname_mapper import ExtractNicknameMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name, defaults to "gpt-4o" |
| nickname_key | str | No | Key name to store nickname relationships in meta field, defaults to MetaKeys.nickname |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from API response |
| system_prompt | Optional[str] | No | System prompt for the task |
| input_template | Optional[str] | No | Template for building the model input |
| output_pattern | Optional[str] | No | Regular expression for parsing model output |
| try_num | PositiveInt | No | Number of retry attempts on error, defaults to 3 |
| drop_text | bool | No | Whether to drop text from output, defaults to False |
| model_params | Dict | No | Parameters for initializing the API model |
| sampling_params | Dict | No | Extra parameters passed to API call |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with nickname relationships stored in meta field |
Usage Examples
process:
- extract_nickname_mapper:
api_model: "gpt-4o"
try_num: 3
drop_text: false