Implementation:Datajuicer Data juicer ImageTaggingVLMMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for generating image tags using a Vision-Language Model (VLM) provided by Data-Juicer.

Description

ImageTaggingVLMMapper is a mapper operator that generates descriptive tags for images using a configurable VLM (default: Qwen/Qwen2.5-VL-7B-Instruct). It uses a system prompt instructing the model to generate 5-10 concise, lowercase, hyphenated descriptive tags in JSON format. The input template combines both the image and its associated text description for comprehensive tagging. It supports both API-based and local HuggingFace/vLLM inference. The JSON output is parsed to extract tag arrays, which are stored in metadata under the configured field name. Supports retry logic and CUDA acceleration.

Usage

Use when you need contextually aware image tagging that can incorporate both visual and textual information, as a modern VLM-based alternative to the RAM-based ImageTaggingMapper.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/image_tagging_vlm_mapper.py

Signature

@OPERATORS.register_module("image_tagging_vlm_mapper")
class ImageTaggingVLMMapper(Mapper):
    def __init__(self,
                 api_or_hf_model: str = "Qwen/Qwen2.5-VL-7B-Instruct",
                 is_api_model: bool = False,
                 *,
                 tag_field_name: str = MetaKeys.image_tags,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 try_num: PositiveInt = 3,
                 **kwargs):

Import

from data_juicer.ops.mapper.image_tagging_vlm_mapper import ImageTaggingVLMMapper

I/O Contract

Inputs

Name	Type	Required	Description
api_or_hf_model	str	No	API model name or HF model name, defaults to "Qwen/Qwen2.5-VL-7B-Instruct"
is_api_model	bool	No	Whether the model is an API model; if False, uses vLLM, defaults to False
tag_field_name	str	No	Field name to store the tags, defaults to MetaKeys.image_tags
api_endpoint	Optional[str]	No	URL endpoint for the API
response_path	Optional[str]	No	Path to extract content from API response
system_prompt	Optional[str]	No	System prompt for the task
input_template	Optional[str]	No	Template for building the model input
model_params	Dict	No	Parameters for initializing the model
sampling_params	Dict	No	Extra parameters passed to API or vLLM call
try_num	PositiveInt	No	Number of retry attempts on error, defaults to 3

Outputs

Name	Type	Description
samples	Dict	Transformed samples with VLM-generated image tags stored in meta field

Usage Examples

process:
  - image_tagging_vlm_mapper:
      api_or_hf_model: "Qwen/Qwen2.5-VL-7B-Instruct"
      is_api_model: false
      try_num: 3

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment