Implementation:Datajuicer Data juicer Load Custom Operators
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Configuration_Management |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
Concrete tool for dynamically loading custom operator modules from user-specified paths provided by the Data-Juicer framework.
Description
The load_custom_operators function in data_juicer/config/config.py dynamically imports Python modules from paths specified in the YAML config's custom_op_paths field. It uses importlib to load modules, triggering any @OPERATORS.register_module() decorators in those files. This integrates external operators into the pipeline without modifying Data-Juicer source code.
Usage
Add custom_op_paths to YAML config, then call init_configs which invokes this function automatically.
Code Reference
Source Location
- Repository: data-juicer
- File: data_juicer/config/config.py
- Lines: L53-98
Signature
def load_custom_operators(paths: list):
"""
Dynamically load custom operator modules from file paths.
Args:
paths: List of file paths or package paths containing custom operators.
"""
Import
# Typically called internally by init_configs
# Direct usage:
from data_juicer.config.config import load_custom_operators
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| paths | list | Yes | List of Python file paths or package paths |
Outputs
| Name | Type | Description |
|---|---|---|
| registration | Side effect | Custom operator classes registered in OPERATORS registry |
Usage Examples
YAML Configuration
# pipeline.yaml
custom_op_paths:
- /path/to/my_custom_ops/
- /path/to/another_op.py
process:
- my_custom_filter:
min_score: 0.8
- another_custom_mapper:
mode: strict
Programmatic Loading
from data_juicer.config import init_configs
# init_configs automatically calls load_custom_operators
cfg = init_configs(args=['--config', 'pipeline_with_custom_ops.yaml'])