Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Datajuicer Data juicer Operator Dependency Management

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, DevOps
Last Updated 2026-02-14 17:00 GMT

Overview

A lazy loading and isolated environment pattern that manages operator-specific Python dependencies to avoid conflicts and reduce startup time.

Description

Operator Dependency Management addresses two problems: (1) different operators may require conflicting package versions, and (2) importing all operator dependencies at startup is prohibitively slow. The solution combines LazyLoader (deferred module imports that auto-install missing packages on first use) with OPEnvSpec (per-operator environment specifications for Ray isolated environments using uv). This allows operators to declare their dependencies, have them installed on-demand, and run in isolated environments in distributed mode.

Usage

Use this principle when a custom operator depends on external packages. Define a _requirements class attribute or use LazyLoader for imports. For Ray mode, provide an OPEnvSpec with pip requirements.

Theoretical Basis

# Abstract pattern (NOT real implementation)
# Lazy loading: defer import until first use
torch = LazyLoader('torch')  # Does not import torch yet
# First access triggers: import torch (and pip install if missing)

# Isolated environments (Ray):
class MyOp(Mapper):
    _requirements = ['torch>=2.0', 'transformers>=4.30']
    # Ray creates an isolated uv environment with these packages

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment