Implementation:Datajuicer Data juicer VideoObjectSegmentingMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for text-guided video object segmentation using YOLOE and SAM2 provided by Data-Juicer.

Description

VideoObjectSegmentingMapper is a mapper operator that performs text-guided semantic segmentation of objects throughout a video using YOLOE for object detection and SAM2 for segmentation mask generation. It uses a YOLOE model for open-vocabulary object detection with configurable confidence threshold, feeds detected bounding boxes into Facebook's SAM2 model for pixel-level segmentation across frames, producing binary or soft masks per object with optional visualization output, all running on CUDA.

Usage

Use when you need automated object segmentation in video data, supporting video editing datasets, object tracking annotations, and semantic understanding of video content for training data curation.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/video_object_segmenting_mapper.py

Signature

@OPERATORS.register_module("video_object_segmenting_mapper")
class VideoObjectSegmentingMapper(Mapper):
    def __init__(self, sam2_hf_model: str = "facebook/sam2.1-hiera-tiny", yoloe_path: str = "yoloe-11l-seg.pt", yoloe_conf: float = 0.5, torch_dtype: str = "bf16", if_binarize: bool = True, if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_object_segmenting_mapper import VideoObjectSegmentingMapper

I/O Contract

Inputs

Name	Type	Required	Description
sam2_hf_model	str	No	HuggingFace model id of SAM2 (default: "facebook/sam2.1-hiera-tiny")
yoloe_path	str	No	Path to the YOLOE model weights (default: "yoloe-11l-seg.pt")
yoloe_conf	float	No	Confidence threshold for YOLOE object detection (default: 0.5)
torch_dtype	str	No	Floating point type for inference: "fp32", "fp16", or "bf16" (default: "bf16")
if_binarize	bool	No	Whether the final mask requires binarization (default: True)
if_save_visualization	bool	No	Whether to save visualization results (default: False)
save_visualization_dir	str	No	Path for saving visualization results

Outputs

Name	Type	Description
samples	Dict	Transformed samples with segmentation masks, class IDs, and confidence scores in metadata

Usage Examples

process:
  - video_object_segmenting_mapper:
      sam2_hf_model: "facebook/sam2.1-hiera-tiny"
      yoloe_conf: 0.5
      if_binarize: true

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment