Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Datajuicer Data juicer Custom Operator Configuration

From Leeroopedia
Revision as of 17:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Datajuicer_Data_juicer_Custom_Operator_Configuration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, Configuration_Management
Last Updated 2026-02-14 17:00 GMT

Overview

A dynamic module loading pattern that allows external custom operator files to be loaded and registered at pipeline startup from user-specified paths.

Description

Custom Operator Configuration enables users to develop operators outside the Data-Juicer package and load them at runtime. The configuration system accepts a custom_op_paths list that specifies paths to Python files or packages containing custom operators. At startup, these modules are dynamically imported using importlib, triggering their @OPERATORS.register_module() decorators. This extends the operator registry without modifying the Data-Juicer source code.

Usage

Use this principle when operators are maintained outside the Data-Juicer repository. Add the custom_op_paths field to the YAML config pointing to the operator files.

Theoretical Basis

# Abstract pattern (NOT real implementation)
for path in config.custom_op_paths:
    module = importlib.import_module(path)
    # @OPERATORS.register_module() decorators execute on import
    # Custom operators now in global registry

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment