Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer GeneralFieldFilter

From Leeroopedia
Knowledge Sources
Domains Data_Quality, Filtering
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for filtering data samples based on a general field filter condition provided by Data-Juicer.

Description

GeneralFieldFilter is a filter operator that keeps samples based on a general field filter condition expressed as a string. The condition can include logical operators (and/or) and chain comparisons, e.g., "10 < num <= 30 and text != 'nothing here'". The condition is parsed using Python's ast module and evaluated for each sample. The result is stored under the general_field_filter_condition stats key. It extends the Filter base class and implements the two-phase compute_stats/process pattern.

Usage

Import this operator when you need to filter dataset samples based on arbitrary field conditions using logical expressions. Configure it in your Data-Juicer YAML config or instantiate directly.

Code Reference

Source Location

Signature

@OPERATORS.register_module("general_field_filter")
class GeneralFieldFilter(Filter):
    def __init__(self, filter_condition: str = "", *args, **kwargs):
        ...

Import

from data_juicer.ops.filter.general_field_filter import GeneralFieldFilter

I/O Contract

Inputs

Name Type Required Description
filter_condition str No The filter condition as a string supporting logical operators (and/or) and chain comparisons. Default: ""

Outputs

Name Type Description
samples Dict Filtered samples with stats field updated (general_field_filter_condition)

Usage Examples

YAML Configuration

process:
  - general_field_filter:
      filter_condition: "10 < num <= 30 and text != 'nothing here'"

Python API

from data_juicer.ops.filter.general_field_filter import GeneralFieldFilter

op = GeneralFieldFilter(filter_condition="10 < num <= 30")
# Apply to dataset
result = dataset.process(op)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment