Principle:Datajuicer Data juicer Operator Package Registration
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Software_Architecture |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
A Python package initialization pattern that ensures operator classes are imported and registered when their containing package is loaded.
Description
Operator Package Registration uses Python's __init__.py module to trigger operator class imports, which in turn executes the @OPERATORS.register_module() decorators. Without this import chain, operator classes would not be registered and would be unavailable in YAML configurations. Each operator type subdirectory (filter/, mapper/, deduplicator/, selector/) has its own __init__.py that imports all operator classes.
Usage
Use this principle when adding a new operator file to an existing package. Add an import statement for the new class in the appropriate __init__.py file.
Theoretical Basis
# Abstract pattern (NOT real implementation)
# data_juicer/ops/filter/__init__.py
from .text_length_filter import TextLengthFilter
from .language_id_score_filter import LanguageIDScoreFilter
from .my_new_filter import MyNewFilter # Add this line
# The import triggers @OPERATORS.register_module() execution
# which registers the class in the global registry