Implementation:Datajuicer Data juicer ImageTaggingVLMMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for generating image tags using a Vision-Language Model (VLM) provided by Data-Juicer.
Description
ImageTaggingVLMMapper is a mapper operator that generates descriptive tags for images using a configurable VLM (default: Qwen/Qwen2.5-VL-7B-Instruct). It uses a system prompt instructing the model to generate 5-10 concise, lowercase, hyphenated descriptive tags in JSON format. The input template combines both the image and its associated text description for comprehensive tagging. It supports both API-based and local HuggingFace/vLLM inference. The JSON output is parsed to extract tag arrays, which are stored in metadata under the configured field name. Supports retry logic and CUDA acceleration.
Usage
Use when you need contextually aware image tagging that can incorporate both visual and textual information, as a modern VLM-based alternative to the RAM-based ImageTaggingMapper.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/image_tagging_vlm_mapper.py
Signature
@OPERATORS.register_module("image_tagging_vlm_mapper")
class ImageTaggingVLMMapper(Mapper):
def __init__(self,
api_or_hf_model: str = "Qwen/Qwen2.5-VL-7B-Instruct",
is_api_model: bool = False,
*,
tag_field_name: str = MetaKeys.image_tags,
api_endpoint: Optional[str] = None,
response_path: Optional[str] = None,
system_prompt: Optional[str] = None,
input_template: Optional[str] = None,
model_params: Dict = {},
sampling_params: Dict = {},
try_num: PositiveInt = 3,
**kwargs):
Import
from data_juicer.ops.mapper.image_tagging_vlm_mapper import ImageTaggingVLMMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| api_or_hf_model | str | No | API model name or HF model name, defaults to "Qwen/Qwen2.5-VL-7B-Instruct" |
| is_api_model | bool | No | Whether the model is an API model; if False, uses vLLM, defaults to False |
| tag_field_name | str | No | Field name to store the tags, defaults to MetaKeys.image_tags |
| api_endpoint | Optional[str] | No | URL endpoint for the API |
| response_path | Optional[str] | No | Path to extract content from API response |
| system_prompt | Optional[str] | No | System prompt for the task |
| input_template | Optional[str] | No | Template for building the model input |
| model_params | Dict | No | Parameters for initializing the model |
| sampling_params | Dict | No | Extra parameters passed to API or vLLM call |
| try_num | PositiveInt | No | Number of retry attempts on error, defaults to 3 |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with VLM-generated image tags stored in meta field |
Usage Examples
process:
- image_tagging_vlm_mapper:
api_or_hf_model: "Qwen/Qwen2.5-VL-7B-Instruct"
is_api_model: false
try_num: 3