Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datajuicer Data juicer Operator Package Registration

From Leeroopedia
Revision as of 17:44, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Datajuicer_Data_juicer_Operator_Package_Registration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, Software_Architecture
Last Updated 2026-02-14 17:00 GMT

Overview

A Python package initialization pattern that ensures operator classes are imported and registered when their containing package is loaded.

Description

Operator Package Registration uses Python's __init__.py module to trigger operator class imports, which in turn executes the @OPERATORS.register_module() decorators. Without this import chain, operator classes would not be registered and would be unavailable in YAML configurations. Each operator type subdirectory (filter/, mapper/, deduplicator/, selector/) has its own __init__.py that imports all operator classes.

Usage

Use this principle when adding a new operator file to an existing package. Add an import statement for the new class in the appropriate __init__.py file.

Theoretical Basis

# Abstract pattern (NOT real implementation)
# data_juicer/ops/filter/__init__.py
from .text_length_filter import TextLengthFilter
from .language_id_score_filter import LanguageIDScoreFilter
from .my_new_filter import MyNewFilter  # Add this line

# The import triggers @OPERATORS.register_module() execution
# which registers the class in the global registry

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment