Implementation:Datajuicer Data juicer VideoNSFWFilter
| Knowledge Sources | |
|---|---|
| Domains | Data_Quality, Filtering |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for filtering data samples based on video NSFW score provided by Data-Juicer.
Description
VideoNSFWFilter is a filter operator that keeps samples where the NSFW score computed from sampled video frames falls within a specified range. It extends Filter and uses the two-phase compute_stats/process pattern. It extracts frames using 'all_keyframes' or 'uniform' sampling, then scores each frame with a HuggingFace NSFW detection model (default: Falconsai/nsfw_image_detection). Per-video scores are reduced across frames via 'avg', 'max', or 'min' mode. Results are cached under video_nsfw_score. Supports 'any'/'all' strategy, CUDA acceleration, configurable video backends ('av', 'ffmpeg'), and operator fusion for shared frame sampling. Critical content safety filter for video datasets.
Usage
Import when filtering based on video NSFW content detection. Configure in YAML or Python.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/filter/video_nsfw_filter.py
Signature
@OPERATORS.register_module("video_nsfw_filter")
class VideoNSFWFilter(Filter):
def __init__(self, hf_nsfw_model: str = "Falconsai/nsfw_image_detection", trust_remote_code: bool = False, min_score: float = 0.0, max_score: float = 0.5, frame_field: Optional[str] = None, frame_sampling_method: str = "all_keyframes", frame_num: PositiveInt = 3, reduce_mode: str = "avg", any_or_all: str = "any", video_backend: str = "av", *args, **kwargs):
Import
from data_juicer.ops.filter.video_nsfw_filter import VideoNSFWFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hf_nsfw_model | str | No | HuggingFace NSFW model name (default: "Falconsai/nsfw_image_detection") |
| min_score | float | No | Minimum NSFW score threshold (default: 0.0) |
| max_score | float | No | Maximum NSFW score threshold (default: 0.5) |
| frame_sampling_method | str | No | Frame sampling: "all_keyframes" or "uniform" (default: "all_keyframes") |
| frame_num | PositiveInt | No | Number of frames for uniform sampling (default: 3) |
| reduce_mode | str | No | Score reduction: "avg", "max", or "min" (default: "avg") |
| any_or_all | str | No | Keep strategy: "any" or "all" (default: "any") |
| video_backend | str | No | Video backend: "ffmpeg" or "av" (default: "av") |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Filtered samples with video_nsfw_score stat computed |
Usage Examples
YAML Configuration
process:
- video_nsfw_filter:
max_score: 0.5
frame_sampling_method: all_keyframes
Python API
from data_juicer.ops.filter.video_nsfw_filter import VideoNSFWFilter
op = VideoNSFWFilter(max_score=0.5)