Implementation:Datajuicer Data juicer VideoObjectSegmentingMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for text-guided video object segmentation using YOLOE and SAM2 provided by Data-Juicer.
Description
VideoObjectSegmentingMapper is a mapper operator that performs text-guided semantic segmentation of objects throughout a video using YOLOE for object detection and SAM2 for segmentation mask generation. It uses a YOLOE model for open-vocabulary object detection with configurable confidence threshold, feeds detected bounding boxes into Facebook's SAM2 model for pixel-level segmentation across frames, producing binary or soft masks per object with optional visualization output, all running on CUDA.
Usage
Use when you need automated object segmentation in video data, supporting video editing datasets, object tracking annotations, and semantic understanding of video content for training data curation.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_object_segmenting_mapper.py
Signature
@OPERATORS.register_module("video_object_segmenting_mapper")
class VideoObjectSegmentingMapper(Mapper):
def __init__(self, sam2_hf_model: str = "facebook/sam2.1-hiera-tiny", yoloe_path: str = "yoloe-11l-seg.pt", yoloe_conf: float = 0.5, torch_dtype: str = "bf16", if_binarize: bool = True, if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, *args, **kwargs):
Import
from data_juicer.ops.mapper.video_object_segmenting_mapper import VideoObjectSegmentingMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| sam2_hf_model | str | No | HuggingFace model id of SAM2 (default: "facebook/sam2.1-hiera-tiny") |
| yoloe_path | str | No | Path to the YOLOE model weights (default: "yoloe-11l-seg.pt") |
| yoloe_conf | float | No | Confidence threshold for YOLOE object detection (default: 0.5) |
| torch_dtype | str | No | Floating point type for inference: "fp32", "fp16", or "bf16" (default: "bf16") |
| if_binarize | bool | No | Whether the final mask requires binarization (default: True) |
| if_save_visualization | bool | No | Whether to save visualization results (default: False) |
| save_visualization_dir | str | No | Path for saving visualization results |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with segmentation masks, class IDs, and confidence scores in metadata |
Usage Examples
process:
- video_object_segmenting_mapper:
sam2_hf_model: "facebook/sam2.1-hiera-tiny"
yoloe_conf: 0.5
if_binarize: true