Implementation:Datajuicer Data juicer DialogSentimentDetectionMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for detecting and labeling user sentiment in multi-turn dialogs provided by Data-Juicer.
Description
DialogSentimentDetectionMapper is a mapper operator that detects and labels the user's sentiments for each query turn in a multi-turn dialog using an API-based language model (default: GPT-4o). It reconstructs the dialog from history, query, and response keys, constructs a prompt with a Chinese few-shot system prompt demonstrating sentiment analysis (e.g., pressure, fatigue, gratitude), sends it to the API model, and parses the response using regex to extract sentiment analysis text and sentiment category labels. Results are stored in metadata under dialog_sentiment_labels and dialog_sentiment_labels_analysis. Supports optional candidate sentiment categories and configurable retry attempts. It extends the Mapper base class.
Usage
Import when you need to enrich conversational datasets with per-turn sentiment annotations for emotionally-aware dialog systems.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/dialog_sentiment_detection_mapper.py
Signature
@OPERATORS.register_module("dialog_sentiment_detection_mapper")
class DialogSentimentDetectionMapper(Mapper):
def __init__(self,
api_model: str = "gpt-4o",
sentiment_candidates: Optional[List[str]] = None,
max_round: NonNegativeInt = 10,
*,
labels_key: str = MetaKeys.dialog_sentiment_labels,
analysis_key: str = MetaKeys.dialog_sentiment_labels_analysis,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
query_template: Optional[str] = None,
response_template: Optional[str] = None,
candidate_template: Optional[str] = None,
analysis_template: Optional[str] = None,
labels_template: Optional[str] = None,
analysis_pattern: Optional[str] = None,
labels_pattern: Optional[str] = None,
try_num: PositiveInt = 3,
model_params: Dict = {},
sampling_params: Dict = {},
**kwargs):
Import
from data_juicer.ops.mapper.dialog_sentiment_detection_mapper import DialogSentimentDetectionMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_model | str | No | API model name. Default: "gpt-4o" |
| sentiment_candidates | Optional[List[str]] | No | Output sentiment candidates. Uses open-domain sentiment labels if None |
| max_round | NonNegativeInt | No | Maximum number of dialog rounds to include in the prompt. Default: 10 |
| labels_key | str | No | Key name in meta field to store output labels. Default: "dialog_sentiment_labels" |
| analysis_key | str | No | Key name in meta field to store analysis. Default: "dialog_sentiment_labels_analysis" |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from the API response |
| system_prompt | Optional[str] | No | System prompt for the task |
| try_num | PositiveInt | No | Number of retry attempts on API call error. Default: 3 |
| model_params | Dict | No | Parameters for initializing the API model |
| sampling_params | Dict | No | Extra parameters passed to the API call (e.g. temperature, top_p) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with dialog_sentiment_labels and dialog_sentiment_labels_analysis added to metadata |
Usage Examples
YAML Configuration
process:
- dialog_sentiment_detection_mapper:
api_model: gpt-4o
max_round: 10
try_num: 3