Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ImageTaggingVLMMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for generating image tags using a Vision-Language Model (VLM) provided by Data-Juicer.

Description

ImageTaggingVLMMapper is a mapper operator that generates descriptive tags for images using a configurable VLM (default: Qwen/Qwen2.5-VL-7B-Instruct). It uses a system prompt instructing the model to generate 5-10 concise, lowercase, hyphenated descriptive tags in JSON format. The input template combines both the image and its associated text description for comprehensive tagging. It supports both API-based and local HuggingFace/vLLM inference. The JSON output is parsed to extract tag arrays, which are stored in metadata under the configured field name. Supports retry logic and CUDA acceleration.

Usage

Use when you need contextually aware image tagging that can incorporate both visual and textual information, as a modern VLM-based alternative to the RAM-based ImageTaggingMapper.

Code Reference

Source Location

Signature

@OPERATORS.register_module("image_tagging_vlm_mapper")
class ImageTaggingVLMMapper(Mapper):
    def __init__(self,
                 api_or_hf_model: str = "Qwen/Qwen2.5-VL-7B-Instruct",
                 is_api_model: bool = False,
                 *,
                 tag_field_name: str = MetaKeys.image_tags,
                 api_endpoint: Optional[str] = None,
                 response_path: Optional[str] = None,
                 system_prompt: Optional[str] = None,
                 input_template: Optional[str] = None,
                 model_params: Dict = {},
                 sampling_params: Dict = {},
                 try_num: PositiveInt = 3,
                 **kwargs):

Import

from data_juicer.ops.mapper.image_tagging_vlm_mapper import ImageTaggingVLMMapper

I/O Contract

Inputs

Name Type Required Description
api_or_hf_model str No API model name or HF model name, defaults to "Qwen/Qwen2.5-VL-7B-Instruct"
is_api_model bool No Whether the model is an API model; if False, uses vLLM, defaults to False
tag_field_name str No Field name to store the tags, defaults to MetaKeys.image_tags
api_endpoint Optional[str] No URL endpoint for the API
response_path Optional[str] No Path to extract content from API response
system_prompt Optional[str] No System prompt for the task
input_template Optional[str] No Template for building the model input
model_params Dict No Parameters for initializing the model
sampling_params Dict No Extra parameters passed to API or vLLM call
try_num PositiveInt No Number of retry attempts on error, defaults to 3

Outputs

Name Type Description
samples Dict Transformed samples with VLM-generated image tags stored in meta field

Usage Examples

process:
  - image_tagging_vlm_mapper:
      api_or_hf_model: "Qwen/Qwen2.5-VL-7B-Instruct"
      is_api_model: false
      try_num: 3

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment