Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer VideoExtractFramesMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for extracting frames from video files provided by Data-Juicer.

Description

VideoExtractFramesMapper is a mapper operator that extracts frames from video files using configurable sampling methods and outputs them as either file paths or in-memory byte arrays. It supports two frame sampling methods: "all_keyframes" for extracting keyframes and "uniform" for evenly-spaced extraction, with optional duration-based video segmentation. Outputs frames in "path" format (saved to a configurable directory) or "bytes" format (loaded into memory), and stores frame information in the sample's metadata.

Usage

Use when you need to extract frames from videos as a prerequisite step for downstream video processing operators such as captioning, tagging, pose estimation, or any frame-level analysis.

Code Reference

Source Location

Signature

@OPERATORS.register_module("video_extract_frames_mapper")
class VideoExtractFramesMapper(Mapper):
    def __init__(self, frame_sampling_method: str = "all_keyframes", output_format: str = "path", frame_num: PositiveInt = 3, duration: float = 0, frame_dir: str = None, frame_key: str = None, frame_field: str = MetaKeys.video_frames, legacy_split_by_text_token: bool = True, video_backend: str = "av", *args, **kwargs):

Import

from data_juicer.ops.mapper.video_extract_frames_mapper import VideoExtractFramesMapper

I/O Contract

Inputs

Name Type Required Description
frame_sampling_method str No Sampling method: "all_keyframes" or "uniform" (default: "all_keyframes")
output_format str No Output format: "path" or "bytes" (default: "path")
frame_num PositiveInt No Number of frames for uniform sampling (default: 3)
duration float No Duration of each segment in seconds; 0 for entire video (default: 0)
frame_dir str No Output directory for extracted frames (required when output_format is "path")
frame_key str No Deprecated field name for frame info; use frame_field instead
frame_field str No Field name for generated frames info (default: "video_frames")
legacy_split_by_text_token bool No Whether to split by special tokens in text field (default: True)
video_backend str No Video backend: "ffmpeg" or "av" (default: "av")

Outputs

Name Type Description
samples Dict Transformed samples with extracted frame paths or bytes in metadata

Usage Examples

process:
  - video_extract_frames_mapper:
      frame_sampling_method: "uniform"
      frame_num: 8
      frame_dir: "/tmp/frames"
      output_format: "path"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment