Implementation:Datajuicer Data juicer DialogSentimentIntensityMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for predicting numeric sentiment intensity scores in multi-turn dialogs provided by Data-Juicer.
Description
DialogSentimentIntensityMapper is a mapper operator that predicts a numeric sentiment intensity score ranging from -5 (extremely negative) to 5 (extremely positive) for each user query turn in a multi-turn dialog, with 0 indicating neutral sentiment. It uses an API-based language model (default: GPT-4o) with a detailed Chinese few-shot system prompt demonstrating how sentiment intensity evolves across dialog turns. The response is parsed using regex to extract per-turn sentiment analysis and integer intensity values. Results are stored in metadata under dialog_sentiment_intensity and dialog_sentiment_intensity_analysis. It extends the Mapper base class.
Usage
Import when you need fine-grained quantitative sentiment tracking across dialog turns for quality assessment or RLHF training data curation.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/dialog_sentiment_intensity_mapper.py
Signature
@OPERATORS.register_module("dialog_sentiment_intensity_mapper")
class DialogSentimentIntensityMapper(Mapper):
def __init__(self,
api_model: str = "gpt-4o",
max_round: NonNegativeInt = 10,
*,
intensities_key: str = MetaKeys.dialog_sentiment_intensity,
analysis_key: str = MetaKeys.dialog_sentiment_intensity_analysis,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
query_template: Optional[str] = None,
response_template: Optional[str] = None,
analysis_template: Optional[str] = None,
intensity_template: Optional[str] = None,
analysis_pattern: Optional[str] = None,
intensity_pattern: Optional[str] = None,
try_num: PositiveInt = 3,
model_params: Dict = {},
sampling_params: Dict = {},
**kwargs):
Import
from data_juicer.ops.mapper.dialog_sentiment_intensity_mapper import DialogSentimentIntensityMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name. Default: "gpt-4o" |
| max_round | NonNegativeInt | No | Maximum number of dialog rounds to include in the prompt. Default: 10 |
| intensities_key | str | No | Key name in meta field to store output intensities. Default: "dialog_sentiment_intensity" |
| analysis_key | str | No | Key name in meta field to store analysis. Default: "dialog_sentiment_intensity_analysis" |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from the API response |
| system_prompt | Optional[str] | No | System prompt for the task |
| try_num | PositiveInt | No | Number of retry attempts on API call error. Default: 3 |
| model_params | Dict | No | Parameters for initializing the API model |
| sampling_params | Dict | No | Extra parameters passed to the API call (e.g. temperature, top_p) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with dialog_sentiment_intensity (list of int, -5 to 5) and dialog_sentiment_intensity_analysis added to metadata |
Usage Examples
YAML Configuration
process:
- dialog_sentiment_intensity_mapper:
api_model: gpt-4o
max_round: 10
try_num: 3