Principle:Datajuicer Data juicer Custom Operator Configuration
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Configuration_Management |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
A dynamic module loading pattern that allows external custom operator files to be loaded and registered at pipeline startup from user-specified paths.
Description
Custom Operator Configuration enables users to develop operators outside the Data-Juicer package and load them at runtime. The configuration system accepts a custom_op_paths list that specifies paths to Python files or packages containing custom operators. At startup, these modules are dynamically imported using importlib, triggering their @OPERATORS.register_module() decorators. This extends the operator registry without modifying the Data-Juicer source code.
Usage
Use this principle when operators are maintained outside the Data-Juicer repository. Add the custom_op_paths field to the YAML config pointing to the operator files.
Theoretical Basis
# Abstract pattern (NOT real implementation)
for path in config.custom_op_paths:
module = importlib.import_module(path)
# @OPERATORS.register_module() decorators execute on import
# Custom operators now in global registry