Principle:Datajuicer Data juicer Operator Dependency Management
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, DevOps |
| Last Updated | 2026-02-14 17:00 GMT |
Overview
A lazy loading and isolated environment pattern that manages operator-specific Python dependencies to avoid conflicts and reduce startup time.
Description
Operator Dependency Management addresses two problems: (1) different operators may require conflicting package versions, and (2) importing all operator dependencies at startup is prohibitively slow. The solution combines LazyLoader (deferred module imports that auto-install missing packages on first use) with OPEnvSpec (per-operator environment specifications for Ray isolated environments using uv). This allows operators to declare their dependencies, have them installed on-demand, and run in isolated environments in distributed mode.
Usage
Use this principle when a custom operator depends on external packages. Define a _requirements class attribute or use LazyLoader for imports. For Ray mode, provide an OPEnvSpec with pip requirements.
Theoretical Basis
# Abstract pattern (NOT real implementation)
# Lazy loading: defer import until first use
torch = LazyLoader('torch') # Does not import torch yet
# First access triggers: import torch (and pip install if missing)
# Isolated environments (Ray):
class MyOp(Mapper):
_requirements = ['torch>=2.0', 'transformers>=4.30']
# Ray creates an isolated uv environment with these packages