Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer Load Custom Operators

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, Configuration_Management
Last Updated 2026-02-14 17:00 GMT

Overview

Concrete tool for dynamically loading custom operator modules from user-specified paths provided by the Data-Juicer framework.

Description

The load_custom_operators function in data_juicer/config/config.py dynamically imports Python modules from paths specified in the YAML config's custom_op_paths field. It uses importlib to load modules, triggering any @OPERATORS.register_module() decorators in those files. This integrates external operators into the pipeline without modifying Data-Juicer source code.

Usage

Add custom_op_paths to YAML config, then call init_configs which invokes this function automatically.

Code Reference

Source Location

  • Repository: data-juicer
  • File: data_juicer/config/config.py
  • Lines: L53-98

Signature

def load_custom_operators(paths: list):
    """
    Dynamically load custom operator modules from file paths.

    Args:
        paths: List of file paths or package paths containing custom operators.
    """

Import

# Typically called internally by init_configs
# Direct usage:
from data_juicer.config.config import load_custom_operators

I/O Contract

Inputs

Name Type Required Description
paths list Yes List of Python file paths or package paths

Outputs

Name Type Description
registration Side effect Custom operator classes registered in OPERATORS registry

Usage Examples

YAML Configuration

# pipeline.yaml
custom_op_paths:
  - /path/to/my_custom_ops/
  - /path/to/another_op.py

process:
  - my_custom_filter:
      min_score: 0.8
  - another_custom_mapper:
      mode: strict

Programmatic Loading

from data_juicer.config import init_configs

# init_configs automatically calls load_custom_operators
cfg = init_configs(args=['--config', 'pipeline_with_custom_ops.yaml'])

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment