Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoObjectSegmentingMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for text-guided video object segmentation using YOLOE and SAM2 provided by Data-Juicer.

Description

VideoObjectSegmentingMapper is a mapper operator that performs text-guided semantic segmentation of objects throughout a video using YOLOE for object detection and SAM2 for segmentation mask generation. It uses a YOLOE model for open-vocabulary object detection with configurable confidence threshold, feeds detected bounding boxes into Facebook's SAM2 model for pixel-level segmentation across frames, producing binary or soft masks per object with optional visualization output, all running on CUDA.

Usage

Use when you need automated object segmentation in video data, supporting video editing datasets, object tracking annotations, and semantic understanding of video content for training data curation.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_object_segmenting_mapper")
class VideoObjectSegmentingMapper(Mapper):
    def __init__(self, sam2_hf_model: str = "facebook/sam2.1-hiera-tiny", yoloe_path: str = "yoloe-11l-seg.pt", yoloe_conf: float = 0.5, torch_dtype: str = "bf16", if_binarize: bool = True, if_save_visualization: bool = False, save_visualization_dir: str = DATA_JUICER_ASSETS_CACHE, *args, **kwargs):

Import

from data_juicer.ops.mapper.video_object_segmenting_mapper import VideoObjectSegmentingMapper

I/O Contract

Inputs

Name Type Required Description
sam2_hf_model str No HuggingFace model id of SAM2 (default: "facebook/sam2.1-hiera-tiny")
yoloe_path str No Path to the YOLOE model weights (default: "yoloe-11l-seg.pt")
yoloe_conf float No Confidence threshold for YOLOE object detection (default: 0.5)
torch_dtype str No Floating point type for inference: "fp32", "fp16", or "bf16" (default: "bf16")
if_binarize bool No Whether the final mask requires binarization (default: True)
if_save_visualization bool No Whether to save visualization results (default: False)
save_visualization_dir str No Path for saving visualization results

Outputs

Name Type Description
samples Dict Transformed samples with segmentation masks, class IDs, and confidence scores in metadata

Usage Examples

process:
  - video_object_segmenting_mapper:
      sam2_hf_model: "facebook/sam2.1-hiera-tiny"
      yoloe_conf: 0.5
      if_binarize: true

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment