Implementation:Datajuicer Data juicer VideoExtractFramesMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for extracting frames from video files provided by Data-Juicer.
Description
VideoExtractFramesMapper is a mapper operator that extracts frames from video files using configurable sampling methods and outputs them as either file paths or in-memory byte arrays. It supports two frame sampling methods: "all_keyframes" for extracting keyframes and "uniform" for evenly-spaced extraction, with optional duration-based video segmentation. Outputs frames in "path" format (saved to a configurable directory) or "bytes" format (loaded into memory), and stores frame information in the sample's metadata.
Usage
Use when you need to extract frames from videos as a prerequisite step for downstream video processing operators such as captioning, tagging, pose estimation, or any frame-level analysis.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/video_extract_frames_mapper.py
Signature
@OPERATORS.register_module("video_extract_frames_mapper")
class VideoExtractFramesMapper(Mapper):
def __init__(self, frame_sampling_method: str = "all_keyframes", output_format: str = "path", frame_num: PositiveInt = 3, duration: float = 0, frame_dir: str = None, frame_key: str = None, frame_field: str = MetaKeys.video_frames, legacy_split_by_text_token: bool = True, video_backend: str = "av", *args, **kwargs):
Import
from data_juicer.ops.mapper.video_extract_frames_mapper import VideoExtractFramesMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| frame_sampling_method | str | No | Sampling method: "all_keyframes" or "uniform" (default: "all_keyframes") |
| output_format | str | No | Output format: "path" or "bytes" (default: "path") |
| frame_num | PositiveInt | No | Number of frames for uniform sampling (default: 3) |
| duration | float | No | Duration of each segment in seconds; 0 for entire video (default: 0) |
| frame_dir | str | No | Output directory for extracted frames (required when output_format is "path") |
| frame_key | str | No | Deprecated field name for frame info; use frame_field instead |
| frame_field | str | No | Field name for generated frames info (default: "video_frames") |
| legacy_split_by_text_token | bool | No | Whether to split by special tokens in text field (default: True) |
| video_backend | str | No | Video backend: "ffmpeg" or "av" (default: "av") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples with extracted frame paths or bytes in metadata |
Usage Examples
process:
- video_extract_frames_mapper:
frame_sampling_method: "uniform"
frame_num: 8
frame_dir: "/tmp/frames"
output_format: "path"