Implementation:Datajuicer Data juicer DialogSentimentIntensityMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for predicting numeric sentiment intensity scores in multi-turn dialogs provided by Data-Juicer.

Description

DialogSentimentIntensityMapper is a mapper operator that predicts a numeric sentiment intensity score ranging from -5 (extremely negative) to 5 (extremely positive) for each user query turn in a multi-turn dialog, with 0 indicating neutral sentiment. It uses an API-based language model (default: GPT-4o) with a detailed Chinese few-shot system prompt demonstrating how sentiment intensity evolves across dialog turns. The response is parsed using regex to extract per-turn sentiment analysis and integer intensity values. Results are stored in metadata under dialog_sentiment_intensity and dialog_sentiment_intensity_analysis. It extends the Mapper base class.

Usage

Import when you need fine-grained quantitative sentiment tracking across dialog turns for quality assessment or RLHF training data curation.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/dialog_sentiment_intensity_mapper.py

Signature

@OPERATORS.register_module("dialog_sentiment_intensity_mapper")
class DialogSentimentIntensityMapper(Mapper):
    def __init__(self,
                 api_model: str = "gpt-4o",
                 max_round: NonNegativeInt = 10,
                 *,
                 intensities_key: str = MetaKeys.dialog_sentiment_intensity,
                 analysis_key: str = MetaKeys.dialog_sentiment_intensity_analysis,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 query_template: Optional[str] = None,
                 response_template: Optional[str] = None,
                 analysis_template: Optional[str] = None,
                 intensity_template: Optional[str] = None,
                 analysis_pattern: Optional[str] = None,
                 intensity_pattern: Optional[str] = None,
                 try_num: PositiveInt = 3,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 **kwargs):

Import

from data_juicer.ops.mapper.dialog_sentiment_intensity_mapper import DialogSentimentIntensityMapper

I/O Contract

Inputs

Name	Type	Required	Description
api_model	str	No	API model name. Default: "gpt-4o"
max_round	NonNegativeInt	No	Maximum number of dialog rounds to include in the prompt. Default: 10
intensities_key	str	No	Key name in meta field to store output intensities. Default: "dialog_sentiment_intensity"
analysis_key	str	No	Key name in meta field to store analysis. Default: "dialog_sentiment_intensity_analysis"
api_endpoint	Optional[str]	No	URL endpoint for the API
response_path	Optional[str]	No	Path to extract content from the API response
system_prompt	Optional[str]	No	System prompt for the task
try_num	PositiveInt	No	Number of retry attempts on API call error. Default: 3
model_params	Dict	No	Parameters for initializing the API model
sampling_params	Dict	No	Extra parameters passed to the API call (e.g. temperature, top_p)

Outputs

Name	Type	Description
samples	Dict	Transformed samples with dialog_sentiment_intensity (list of int, -5 to 5) and dialog_sentiment_intensity_analysis added to metadata

Usage Examples

YAML Configuration

process:
  - dialog_sentiment_intensity_mapper:
      api_model: gpt-4o
      max_round: 10
      try_num: 3

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment