Implementation:Datajuicer Data juicer DetectCharacterAttributesMapper
| Knowledge Sources | |
|---|---|
| Domains | Multimodal Processing, Character Analysis, Object Detection |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Extracts and classifies attributes of main characters in an image using a multi-model pipeline combining object detection, image-text matching, and language model inference.
Description
DetectCharacterAttributesMapper is an advanced multimodal analysis operator that builds rich character annotations from images. Given an image, a caption, and a list of main character names, it performs the following steps:
- Character Location -- Uses the DetectCharacterLocationsMapper (which internally uses YOLOE for detection and BLIP for image-text matching) to locate main characters in the image with bounding boxes
- Character Classification -- For each character, queries a LLaMA-based multimodal LLM (default: llava-v1.6-vicuna-7b) to classify it into one of five categories: object, animal, person, text, or other
- Feature Extraction -- Extracts characteristic phrases (color, material, action, etc.) from the caption text for each character using the same LLM
- Visual Verification -- Crops each character's bounding box region and verifies extracted features against the actual visual content using yes/no LLM queries
- Category-Specific Expansion -- Based on the character's class, generates additional class-specific features (e.g., clothing/age for persons, color/action for animals)
The final output includes bounding boxes and validated characteristic lists for each main character, stored in the main_character_attributes_list field under the sample's meta key.
Requires CUDA acceleration. Registered as a tagging operator and supports fused image loading.
Usage
Use this operator when building character-centric datasets that require detailed per-character attribute annotations. It is suitable for image-text datasets where fine-grained character understanding is needed for downstream tasks such as character-driven story generation or visual question answering.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/detect_character_attributes_mapper.py
- Lines: 1-313
Signature
class DetectCharacterAttributesMapper(Mapper):
_accelerator = "cuda"
def __init__(
self,
detect_character_locations_mapper_args: Optional[Dict] = {},
*args, **kwargs,
):
Import
from data_juicer.ops.mapper.detect_character_attributes_mapper import DetectCharacterAttributesMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| detect_character_locations_mapper_args | Dict | No | Arguments for the character location sub-operator. Controls thresholds for detection, matching, and model paths. Default uses: llava-v1.6-vicuna-7b-hf, blip-itm-base-coco, yoloe-11l-seg.pt, iou_threshold=0.7, matching_score_threshold=0.4 |
Sample Fields
| Name | Type | Required | Description |
|---|---|---|---|
| main_character_list | list[str] | Yes | List of main character names to detect and analyze |
| images | list[str] | Yes | List of image paths |
| text | str | Yes | Caption text describing the image content |
Outputs
| Name | Type | Description |
|---|---|---|
| sample[Fields.meta]["main_character_attributes_list"] | list[dict] | List of dictionaries, each containing "main_character" (str), "bbox" (list), and "characteristic_list" (list[str]) |
Usage Examples
# Basic usage
mapper = DetectCharacterAttributesMapper()
# With custom detection parameters
mapper = DetectCharacterAttributesMapper(
detect_character_locations_mapper_args={
"iou_threshold": 0.5,
"matching_score_threshold": 0.3,
"yoloe_path": "yoloe-11l-seg.pt",
}
)
# Process a sample
sample = {
"main_character_list": ["boy", "dog"],
"images": ["/path/to/image.jpg"],
"text": "A boy in a blue shirt sitting on a fence with his golden retriever.",
}
result = mapper.process_single(sample, rank=0)
# result[Fields.meta]["main_character_attributes_list"] contains attributes