Implementation:Datajuicer Data juicer DetectCharacterAttributesMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Multimodal Processing, Character Analysis, Object Detection
Last Updated	2026-02-14 16:00 GMT

Overview

Extracts and classifies attributes of main characters in an image using a multi-model pipeline combining object detection, image-text matching, and language model inference.

Description

DetectCharacterAttributesMapper is an advanced multimodal analysis operator that builds rich character annotations from images. Given an image, a caption, and a list of main character names, it performs the following steps:

Character Location -- Uses the DetectCharacterLocationsMapper (which internally uses YOLOE for detection and BLIP for image-text matching) to locate main characters in the image with bounding boxes
Character Classification -- For each character, queries a LLaMA-based multimodal LLM (default: llava-v1.6-vicuna-7b) to classify it into one of five categories: object, animal, person, text, or other
Feature Extraction -- Extracts characteristic phrases (color, material, action, etc.) from the caption text for each character using the same LLM
Visual Verification -- Crops each character's bounding box region and verifies extracted features against the actual visual content using yes/no LLM queries
Category-Specific Expansion -- Based on the character's class, generates additional class-specific features (e.g., clothing/age for persons, color/action for animals)

The final output includes bounding boxes and validated characteristic lists for each main character, stored in the main_character_attributes_list field under the sample's meta key.

Requires CUDA acceleration. Registered as a tagging operator and supports fused image loading.

Usage

Use this operator when building character-centric datasets that require detailed per-character attribute annotations. It is suitable for image-text datasets where fine-grained character understanding is needed for downstream tasks such as character-driven story generation or visual question answering.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/detect_character_attributes_mapper.py
Lines: 1-313

Signature

class DetectCharacterAttributesMapper(Mapper):
    _accelerator = "cuda"

    def __init__(
        self,
        detect_character_locations_mapper_args: Optional[Dict] = {},
        *args, **kwargs,
    ):

Import

from data_juicer.ops.mapper.detect_character_attributes_mapper import DetectCharacterAttributesMapper

I/O Contract

Inputs

Name	Type	Required	Description
detect_character_locations_mapper_args	Dict	No	Arguments for the character location sub-operator. Controls thresholds for detection, matching, and model paths. Default uses: llava-v1.6-vicuna-7b-hf, blip-itm-base-coco, yoloe-11l-seg.pt, iou_threshold=0.7, matching_score_threshold=0.4

Sample Fields

Name	Type	Required	Description
main_character_list	list[str]	Yes	List of main character names to detect and analyze
images	list[str]	Yes	List of image paths
text	str	Yes	Caption text describing the image content

Outputs

Name	Type	Description
sample[Fields.meta]["main_character_attributes_list"]	list[dict]	List of dictionaries, each containing "main_character" (str), "bbox" (list), and "characteristic_list" (list[str])

Usage Examples

# Basic usage
mapper = DetectCharacterAttributesMapper()

# With custom detection parameters
mapper = DetectCharacterAttributesMapper(
    detect_character_locations_mapper_args={
        "iou_threshold": 0.5,
        "matching_score_threshold": 0.3,
        "yoloe_path": "yoloe-11l-seg.pt",
    }
)

# Process a sample
sample = {
    "main_character_list": ["boy", "dog"],
    "images": ["/path/to/image.jpg"],
    "text": "A boy in a blue shirt sitting on a fence with his golden retriever.",
}
result = mapper.process_single(sample, rank=0)
# result[Fields.meta]["main_character_attributes_list"] contains attributes

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment